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Trial and triumph 


The success of an Ebola vaccine trial shows that clinical trials can be done under the difficult field 
conditions of an epidemic — if there is enough political and regulatory will. 


Ebola vaccine in Guinea has reported a promising outcome. It 
is fantastic news — even the most cautious disease experts are 
hopeful that a corner has been turned. 

What now? In a special collection of articles this week, Nature 
analyses the vaccine breakthrough and looks more broadly at the 
prospects for future control of epidemic threats. It is not all good 
news, and there are bound to be setbacks, but those who value the 
role of research in improving human welfare — and those who argue 
for broader recognition of that role among policymakers — can now 
walk a little taller. 

Make no mistake: conducting an efficacy trial of a vaccine or a drug 
during an epidemic is difficult, to put it mildly. In the past, delays in 
getting regulatory approval for trials meant that outbreaks were usually 
over before the trial even started, so drugs and vaccines needed to treat 
the outbreak, or future ones, could not be tested. 

Running a clinical efficacy trial in the arduous field conditions 
of an epidemic zone is no mean feat either. Yet against the odds, 
an international team of researchers not only did just that, but also 
showed that one shot of the vaccine had 100% efficacy — none of 
those vaccinated at the start of the trial developed Ebola ten days 
after receiving the vaccine (A. M. Henao-Restrepo et al. Lancet 
http://dx.doi.org/10.1016/S0140-6736(15)61117-5; 2015). 

That such a vaccine could be clinically tested — a process that 


Te is a big week in the fight against Ebola: a clinical trial of an 


usually takes years — in a short space of time and without the 
facilities of a sophisticated research hospital must rewrite the rules for 
how drug trials for infectious-disease threats are conducted. Faced 
with the urgency of Ebola, international collaborations of scientists, 
regulators, pharmaceutical companies and non-governmental organi- 
zations — and, to its credit, the World Health Organization, which had 
a leading role — pulled together with unprecedented speed to push 
vaccines and drugs through testing and into field trials. 

Roll-out of the vaccine to more people will provide data to confirm 
its effectiveness. But by vaccinating the families, friends, health-care 
workers and others who come into contact with infected people, Ebola 
outbreaks could be stopped in their tracks — the same strategy that was 
used to eradicate smallpox in the 1970s. This means that this vaccine 
can, in principle, be deployed immediately to help to end the Ebola 
epidemic in West Africa. As aptly conveyed by the trial’s French name, 
‘Ebola, ¢a suffit? (Ebola, that’s enough’), it is time to finish the job. 

The job remains, because even if Ebola has faded from the head- 
lines, it is far from over. Eighteen months after it began, the epidemic 
continues to cause 20-30 cases a week. It could flare up at any time or 
spread to as-yet-unaffected countries in the region, taking the situa- 
tion back to square one. Although vaccines will need to be developed 
against the four other species of Ebola virus, the efficacy of this vaccine 
against the Zaire species — if confirmed — means that never again 
should an Ebola epidemic occur on the same scale as in West Africa. = 


Driving test 


‘Gene drive’ techniques have the potential to alter 
whole populations. Regulators must catch up. 


a — then hypothetical — way to use cutting-edge genetic tech- 

niques to rapidly alter entire populations of plants or animals. 
Such a technique, called a gene drive, could lead to unanticipated 
ecological consequences, they cautioned (K. A. Oye et al. Science 
345, 626-628; 2014). The authors discussed safety guidelines, made 
general policy recommendations, and met with some criticism: why 
raise alarm over a technique that did not yet exist? 

Less than a year later, it did exist. Two groups have now published 
examples of gene drives engineered using CRISPR, a versatile and rela- 
tively easy system that allows researchers to make changes to genomes 
with pinpoint precision (see page 16). Crucially, it enabled a designated 
mutation to copy itself from one chromosome in a pair to the other, 
ensuring that it was passed to offspring and allowing it to spread rapidly 


| ast year, researchers and policy experts expressed concerns about 


through a population (V. M. Gantz and E. Bier Science 348, 442-444 
(2015) and J. E. DiCarlo etal. Preprint at http://doi.org/6k2; 2015). 

Engineering a lab animal or agricultural crop is one thing. Wield- 
ing the power to alter an entire wild population is quite another. The 
process understandably raises concern. But it could hold great benefit: 
mosquitoes could be tweaked so that they cannot carry malaria, or an 
endangered species could be saved by wiping out an invasive competitor. 

Last week, the debate gained momentum when the US National 
Academy of Sciences held its first meeting to evaluate the potential 
benefits and risks of gene drives. As is often the case by the time such 
controversies start to attract mainstream attention, specialist research- 
ers have been thrashing out the issue for years. These discussions have 
already produced various sets of guidelines on the use of gene drives, 
and the academy and others should use this literature as a starting point. 

What is new is the advent of CRISPR. This adds extra dimensions 
to the debate, because it makes gene drives much easier and could 
dramatically accelerate the timeline for a potential release — acciden- 
tal or intentional. Researchers and funding agencies should take note, 
and efforts to understand the ecological consequences of a gene drive 
should be made an urgent priority. Regulators and the wider world 
need to keep pace with the rapid development of CRISPR technology, 
and there is little time to waste. m 
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to be difficult, and so it is proving. But the world is already talking 

about what to do ‘post-Ebola. Although there is general agree- 
ment on what needs to be improved — chiefly, local capacity and 
health care — to ensure a better response to the next epidemic there 
have been few concrete actions. Why? Because all global movements 
must start with a nucleus, around which broader efforts can aggregate. 
This has not yet been established. Here, I describe how the Pasteur 
Institute in Paris could form part of such a nucleus, and collaborate 
with the relevant national and international stakeholders. 

The institute has joined with the Chinese Center For Disease 
Control and Prevention (China CDC) to convert talking into tangible 
action. In consultation with the governments of Guinea, Liberia and 
Sierra Leone — the countries that have been dev- 
astated by Ebola — our collaboration will invest 
in public health, education and research in the 
region to address the urgent needs highlighted 
by the Ebola epidemic. 

Our main aim is to revitalize these countries’ 
overextended health systems within five years, 
by improving science education and the train- 
ing of health-care professionals. Our secondary 
objective is to strengthen research facilities, field 
surveillance and laboratory analysis to track and 
combat emerging and re-emerging infectious 
disease. 

The Pasteur Institute has worked in West Africa 
for almost a century, partly as a legacy of France's 
history in the region. It has sites in Senegal's capital 
Dakar and in Abidjan, capital of Céte d'Ivoire, and 
it will soon open a centre in Conakry, Guinea. As 
such, the institute has been deeply involved in the 
international efforts to fight Ebola, particularly in Guinea. 

The China CDC gained enormous experience in infectious-disease 
response during the 2003 outbreak of severe acute respiratory syn- 
drome (SARS), and it has applied this expertise in the Ebola crisis, 
working under the leadership of deputy director-general George Gao. 
China and the Pasteur Institute have worked together during the crisis, 
and this collaboration forms the foundation of the new initiative. 

The Ebola outbreak exposed the lack of local expertise. There was a 
shortage of skilled scientists and health-care workers able to diagnose 
the disease, for example. Our initiative will aim to help local authorities 
to revamp graduate training in science and public health in regional 
universities — including in Senegal and Céte dIvoire — and through- 
out the Pasteur network in Africa. 


S tamping out the Ebola outbreak in West Africa was always going 


We aim to train more students inemerging NATURE.COM 
infectious diseases, global public health and _ For Nature's special 
veterinary medicine. We will offer direct train- _ on Ebola, see: 


ing, long-distance learning and internships _ nature.com/ebola 
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Train Africa’s scientists 
in crisis response 


To prevent future epidemics, a new international effort must boost West 
Africa’s scientific and public-health capacity, says Christian Bréchot. 


for students to work in Pasteur labs and with our teams in the field. 
Where needed, we will help to build facilities with the latest training 
technologies. 

Surveillance has proved a real problem in the Ebola outbreak, and an 
important goal of our initiative is to train a substantial pool of skilled 
local professionals in techniques such as epidemiology so that they 
can do research-based disease tracking. By working with international 
partners we plan to improve access to modern equipment for medical 
biologists and other scientists in Africa. 

We will also work to provide extra training for existing African 
scientists by funding postdoctoral fellowships and training at approved 
laboratories and facilities in West Africa. Today’s professors will train 
tomorrow’s students, so it is crucial that we establish more research 
opportunities for local scientists. We hope that 
encouraging them to develop projects with 
international partners will generate a virtuous 
circle to ensure the sustainability ofa research 
programme for public health. 

The fellowships will cover a range of scientific 
disciplines, from outbreak investigation and rapid 
response to quality assurance. There are currently 
not enough — if any — of these opportunities in 
poor countries such as Guinea and Liberia. These 
nations need a new generation of African doctors, 
nurses, lab technicians and PhD-trained scien- 
tists. We will also provide funding to encourage 
the brightest African postdoctoral scientists who 
have trained abroad to return. 

It is important that response to future disease 
outbreaks is informed by new knowledge about 
basic science. The joint project will fund col- 
laborations between African scientists and inter- 
national organizations that focus on the various components of an 
epidemic: the pathogen, vector, environment and host genetics. 

Infectious-disease outbreaks will continue to occur. We must capi- 
talize on the current political momentum and the will of international 
agencies and take steps to make lasting improvements to education, 
research and surveillance in West Africa and change the way the next 
outbreak unfolds. 

The Pasteur Institute and the China CDC want to provide a foun- 
dation for this international effort. We appeal to other governmen- 
tal and international organizations, African networks and funders 
to join us. Together, we can rebuild and establish the scientific 
and medical capacity and infrastructure that West Africa needs to 
recover from this outbreak, and to ensure that it is prepared for the 
next one. m SEE EDITORIAL P.5, NEWS FEATURE P.22, COMMENT P.27 & P.29 


Christian Bréchot is president of the Pasteur Institute in Paris. 
e-mail: christian. brechot@pasteur.fr 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Only left-handed 
particles decay 


Only subatomic particles 
with a left-handed spin 
decay asa result of one of 

the fundamental forces, 
confirming that the Universe 
has a left-hand bias. 

A team working on the 
LHC) experiment looked at the 
decay of trillions of subatomic 
particles known as A’, baryons 
emerging from collisions at 
the Large Hadron Collider 
at CERN, Europe’ particle- 
physics laboratory near 
Geneva, Switzerland. During 
this decay, a bottom quark from 
the baryon can turn into an up 
quark. The team confirmed 
that the weak nuclear force — 
one of the four fundamental 
forces in the Universe — causes 
only bottom quarks with 
left-handed spin to decay into 
up quarks, as predicted by the 
standard model of particle 
physics. 

Previous measurements had 
suggested that right-handed 
quarks might also decay in this 
way, which, if true, would have 
called for new fundamental 
forces of nature. 

Nature Phys. http://doi.org/6kg 
(2015) 


Better estimates 
of extinction risk 


Using an improved method 
for calculating the extinction 
risk of species could lower the 
risk estimates for about one in 
ten threatened species. 

The influential Red List 
from the International Union 
for Conservation of Nature 
(IUCN) groups thousands 
of threatened plants and 
animals into different 
categories of extinction risk. 
Lucas Joppa at Microsoft 
Research in Redmond, 


New carnivorous plant found on Facebook 


A new species of insect-eating sundew plant 
(Drosera magnifica; pictured) has been 
identified after an amateur naturalist posted 
photographs of it on Facebook. 

Paulo Minatel Gonella at the University of Sao 
Paulo in Brazil and his colleagues were alerted to 
the photos on the social network, and travelled 
to southeastern Brazil to study the carnivorous 
species, which grows in a narrowly defined 


Washington, and his 
colleagues analysed different 
methods of calculating 

the ‘extent of occurrence’ 
(EOO) for 21,763 species 

of mammals, birds and 
amphibians on the Red List. 
The EOO is the total area 
over which a species might be 
found — the smaller the area, 
the greater the vulnerability 
of that species. 

Past assessments often used 
EOO calculation methods 
that the IUCN now considers 
outdated. The researchers 
found that applying the 
IUCN-approved method 
would lower the risk category 
of many threatened animals 
for 14-15% of mammals, 
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habitat on a single mountain. The plant has stems 
roughly 1.5 metres long and is the largest Drosera 
species in the Americas. The team found many 


insects trapped ina sticky substance produced by 


7-8% of birds and 12-15% of 
amphibians. 

Conserv. Biol. http://doi.org/6jq 
(2015) 


Microbes ramp up 
red-meat risk 


Microbes in the gut help to 
boost the risk of colon cancer 
when haem, the pigment found 
in red meat, is present. 

Haem in the diet has been 
linked to an increased risk of 
colon cancer — the pigment 
damages cells lining the gut, 
which leads to excessive 
cell proliferation. Noortje 
Tjssennagger at University 
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the plant’s red tentacles, which cover the leaves. 
The sundew is considered critically 

endangered, because coffee and eucalyptus 

plantations threaten its habitat. 

Phytotaxa 220, 257-267 (2015) 


Medical Center Utrecht in the 
Netherlands and her colleagues 
fed mice a diet containing 
haem and found that animals 
that also received antibiotics 
did not have this gut damage 
or increased cell proliferation. 
Haem increased the level of a 
bacterium called Akkermansia 
muciniphila, which breaks 
down the gut mucus lining, 
exposing gut cells to the 
damaging haem. Gut bacteria 
that produce sulfide also 
degrade this mucus barrier. 
Using a biomarker to 
monitor gut mucus degradation 
could bea way to gauge colon- 
cancer risk, the authors say. 
Proc. Natl Acad. Sci. USA 
http://doi.org/6jp (2015) 


PAULO MINATEL GONELLA 


LIVIA CARVALHO 


ETTORE MORONE 


Greenland glaciers 
have hidden depths 


Greenland’s glaciers may be 
more susceptible to global 
warming than was thought. 

Eric Rignot of the University 
of California, Irvine, and 
his colleagues used sonar to 
analyse the depths and profiles 
of three glaciers terminating in 
fjords in western Greenland. 
They found that the glaciers 
reach hundreds of metres 
farther down into the ocean 
than current maps suggest, 
allowing the ice to come into 
contact with a deep layer of 
warm Atlantic water. This 
leads to melting and the 
formation of deep cavities 
that probably boost the 
chances of large glacier 
chunks breaking off. 

The authors note that these 
processes are not included 
in current ice-sheet models, 
and suggest that estimates of 
Greenland’s contribution to 
sea-level rise will need to be 
increased. 
Geophys. Res. Lett. http://doi. 
org/6dn (2015) 


PALAEONTOLOGY 


Lizards evolved at 
snail’s pace 


Lizards of the Caribbean 
islands have changed little 
over millions of years. 

Only three fossils of Anolis 
lizards have previously been 
studied, but now Emma 
Sherratt at the University of 
New England in Armidale, 
Australia, and her colleagues 
have analysed a further 
17 fossils entombed in 
amber (pictured) from 
the Dominican Republic. 
The specimens, which are 
15 million to 20 million years 
old, revealed that the animals 
were uniquely adapted 
to the different parts 
of the trees that they 
inhabit, much as 
they are today. For 
instance, lizards 
that lived on twigs 
tended to be small 
with short limbs. 


Other fossils resembled larger 
lizards that live near the base 
of tree trunks and those found 
around the crowns of trees. 
The findings suggest that 
communities can remain 
remarkably stable over long 
evolutionary timescales. 
Proc. Natl Acad. Sci. USA 
http://doi.org/6hk (2015) 


GEOPHYSICS 


Ancient roots of 
Earth’s magnetism 


Earth may have developed a 
magnetic field as early as four 
billion years ago — more than 
halfa billion years earlier than 
was thought. 

John Tarduno at the 
University of Rochester in 
New York and his colleagues 
measured faint magnetic 
signals of iron-bearing 
minerals trapped inside zircon 
crystals up to four billion years 
old from the Jack Hills region 
of Western Australia. They 
found that the magnetic field 
fluctuated in strength, froma 
value similar to today’s field 
(around 25 microteslas) to 
about 12% of that. 

Anancient magnetic field 
when the planet was only 
about 500 million years old 
would have been a good, if 
imperfect, shield against the 
solar wind. This could have 
made the young planet 
more hospitable to life, the 
authors say. 

Science 349, 521-524 (2015) 


Scaling up pure 
graphene growth 


Researchers have found a 
way to grow and transfer 
crystals of graphene more 
efficiently compared with 
other methods. 

Pure graphene comprises 
one-atom-thick sheets of 
carbon that have desirable 
electronic properties, and 
is best made by stripping 
a single layer of atoms 
off a graphite crystal. 
However, the process 

is hard to scale up 
for industrial use 


RESEARCH HIGHLIGHTS BiiiSaiaa¢ 


SOCIAL SELECTION 


Popular topics 
_on social media 


Away to solve irreproducibility? 


A growing backlog of psychology findings that have never been 
reproduced has shaken confidence in the field. One possible 
remedy is to require PhD students to replicate at least one study 
from their own specialism as part of their education, write UK 
psychologists Brian Earp and Jim Everett in an opinion piece 
in Frontiers in Psychology. “Best suggestion I've heard [with 
respect to] the replication crisis in psychology. Plus seems like 
just smart pedagogy,’ tweeted Jonathan LaTourelle, a PhD 
student in the philosophy of cognitive science at Arizona 
State University in Tempe. But ethicist Owen Schaefer at the 
University of Oxford, UK, suggested in a comment on a blog 
post that the proposal could end up 


> NATURE.COM “disproportionately burdening” graduate 
For more on students. 

popular papers: Front. Psychol. http://dx.doi.org/10.3389/ 
go.nature.com/howwhv  fpsyg.2015.01152 (2015) 


and other, more scaleable 
methods introduce 
contaminants. So Christoph 
Stampfer at RWTH Aachen 
University in Germany and 
his colleagues synthesized a 
layer of graphene on copper, 
and used a compound called 
hexagonal boron nitride 
to peel the graphene off 
and transfer it to another 
substrate. This yielded 
crystals with fewer flaws 
than those made using other 
techniques, and the copper 
could be used again to 
produce more graphene. 
The resulting material has 
electronic properties that rival 
the best graphene made by 
other, less scaleable methods, 
the authors report. 
Sci. Adv. 1, 1500222 (2015) 


VIROLOGY 


Ancestral virus 
for gene therapy 


An ancient virus 
reconstructed by researchers 
could make gene therapy 
more efficient. 

Viruses are used in 
such therapies to deliver 
functioning genes to diseased 
cells in the body, but better 
viruses are needed to transfer 
genes more efficiently. 
Luk Vandenberghe of the 
Massachusetts Eye and Ear 


Infirmary in Boston and 

his colleagues analysed the 
amino-acid sequences of the 
proteins that coat 75 adeno- 
associated viruses (AAV), 

5 of which are being tested in 
human gene-therapy trials. 
They predicted how the 
structure of these proteins 
might have evolved, and came 
up with protein sequences 
for 9 AAVs that might have 
been ancestors of the current 
viruses. They synthesized the 
ancient AAVs and found that 
one, Anc80, could efficiently 
transfer genes to muscles and 
the retina in mice (pictured), 
and to the liver in both mice 
and monkeys. 

Anc80 did not trigger any 
negative side effects in these 
animals that would prevent it 
from delivering genes to cells. 
Cell Rep. http://doi.org/6j6 (2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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SEVEN DAYS 


Ebola vaccine 


An experimental Ebola vaccine 
seems to confer total protection 
against infection in people who 
are at high risk of contracting 
the virus, according toa trial 

in Guinea (A. M. Henao- 
Restrepo et al. Lancet http:// 
dx.doi.org/10.1016/S0140- 
6736(15)61117-5; 2015). 

The vaccine, developed by 

the Public Health Agency of 
Canada and licensed to drug 
company Merck, is made 

from a livestock virus that has 
been engineered to produce 

an Ebola protein. The trial 
included two arms: of the 
2,014 people who received 

the vaccine immediately, 

none developed Ebola ten 

days after getting the vaccine. 
There were 16 infections in the 
2,380 people who were given 
the vaccine 3 weeks later. See 
page 13 for more. 


Rice retraction 

A paper claiming that the 
genetically engineered crop 
Golden Rice was an effective 
vitamin A supplement in 
children in China was retracted 
from The American Journal of 
Clinical Nutrition on 29 July. 
The journal retracted the paper 
(G. Tang et al. Am. J. Clin. Nutr. 
96, 658-664; 2012) because the 
authors, led by Guangwen Tang 
of Tufts University in Boston, 
Massachusetts, did not fulfil 
ethical requirements. Among 
other things, the journal says, 
they did not provide evidence 
that they had ethical approval 
for their experiments. The 
authors filed an injunction 

last year to stop the retraction, 
but were denied one bya 
Massachusetts court on 17 July. 


White rhino dies 


Just four northern white 
rhinos (Ceratotherium simum 
cottoni) remain in the world 


The news in brief 


Arrests in Hawaii over telescopes 


Protests over telescope building on Hawaii's 
mountains led to the arrest on the night of 

30 July of more than 20 demonstrators on 

the island of Maui, where a 4.2-metre solar 
telescope is under construction on Haleakala. 
Seven protesters were also arrested at Mauna 
Kea on Big Island, in the latest escalation in the 


after a female named Nabiré 
died at the Dvir Kralové Zoo 
in the Czech Republic on 

27 July. The northern white 
rhino is a subspecies of the 
white rhino, which numbers 
around 20,000 in the wild. 
The zoo reported that Nabiré 
died when a large cyst inside 
her ruptured. This leaves one 
male and two female northern 
white rhinos in Kenya and 
one female in San Diego, 
California. 


Crash investigation 
SpaceShipTwo, a rocket 

plane owned by spaceflight 
company Virgin Galactic, 
broke up during a test flight 
in October 2014 because the 
co-pilot activated a braking 
system too soon. In its 28 July 
report on the crash, the US 
National Transportation 
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Safety Board stated that 

the plane’s designer, Scaled 
Composites in Mojave, 
California, failed to account 
for human error in the pre- 
flight hazard analysis that it 
provided to federal-aviation 
regulators. This omission 
ultimately led to the accident. 


Anthrax grounded 
The US shipping company 
FedEx says that it will 

stop carrying dangerous 
pathogens, or ‘select agents. 
The move comes after the 
US military announced in 
May that it had accidentally 
shipped live anthrax 

spores to nearly 200 labs in 
9 countries. According to 
USA Today, which revealed 
the announcement on 29 July, 
FedEx is one of only two 
companies that ship select 
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stand-off over adding the planned Thirty Meter 
Telescope (TMT) to the 13 telescopes near the 
summit of Mauna Kea, which is sacred to Native 
Hawaiians. Protestors are also expected at the 
International Astronomical Union meeting in 
Honolulu from 3 to 14 August. Construction of 
the TMT remains on hold indefinitely. 


agents, leaving researchers 
concerned that it will be 
difficult to transport samples 
in the case of an epidemic. 
FedEx says that it carried at 
least one of the live anthrax 
samples, from Dugway 
Proving Ground in Utah. 


Punish poaching 

Ina resolution passed by 

its general assembly on 

30 July, the United Nations 
asked its members to step 

up the fight against wildlife 
crime. The decree expresses 
“serious concern” over the 
poaching of rhinos and 
elephants in Africa and urges 
member states to strengthen 
legislation to prevent and 
prosecute illegal trade. The 
resolution follows the London 


HAWAII DEPT. LAND AND NATURAL RESOURCES 


KIERAN DODDS/PANOS 


SOURCE: NATIONAL HURRICANE CENTER 


Declaration of February 2014, 
in which 41 nations agreed 

to deem poaching a serious 
crime — a technical UN term 
designed to result in harsher 
punishment for offenders. 


US emissions curb 
US President Barack Obama 
announced landmark 
regulations on 3 August 

to curb greenhouse-gas 
emissions from power plants. 
The regulations, developed 

by the US Environmental 
Protection Agency (EPA), 

call for a 32% reduction in 
emissions from the 2005 level 
by 2030. The target is stricter 
than the one proposed in 

June last year, which laid out a 
30% cut. US states must now 
develop plans for how to reach 
the targets and submit them to 
the EPA by September 2016. 
See go.nature.com/fbidf5 for 
more. 


Chimp ruling 
Chimpanzees are not legal 
persons, a New York court 
ruled on 29 July. The activist 
group Nonhuman Rights 
Project had filed a suit in 2013 
on behalf of two research 
chimpanzees at Stony Brook 
University in New York, 
arguing that the animals were 
being unlawfully detained. In 
her decision, judge Barbara 
Jaffe wrote that she was 
bound by legal precedent. 
But she did not discount the 
group’s argument, saying 


TREND WATCH 


This month marks the tenth 


anniversary of Hurricane Katrina's 
landfall along the northern Gulf 
of Mexico coast. The destruction 


left by the hurricane prompted 
the US National Oceanic and 
Atmospheric Administration 


(NOAA) and other scientists to 
put more effort into developing 
tools and techniques to improve 
the accuracy of their forecasts. 
The forecast error on NOAAs 
hurricane-path predictions two 
days ahead of landfall decreased 
from 204 kilometres in 2005 to 
120 kilometres in 2014. 


that the judicial system is 
slow to embrace change, but 
chimpanzees might one day 
gain legal rights. 


Physics spy suspect 


A Russian physicist who 
worked in the Netherlands 
is under suspicion of 
having passed confidential 
information to the Russian 
intelligence service. 
Photonics and quantum- 
computing researcher Ivan 
Agafonov, who denies the 
allegations, worked at the 
Eindhoven University 

of Technology (TUE). A 
statement published by the 
TUE on 28 July says that the 
university was informed by 
Dutch intelligence services 
in July 2014 that Agafonov 
was in contact with Russian 
intelligence services. The 
statement adds that the 
university has terminated 
Agafonov’s employment, 


and Dutch authorities have 
revoked his residency permit 
on suspicion of espionage. 


Telescope head 

The president of the 

Giant Magellan Telescope 
Organization (GMTO) has 
stepped down. Physicist Ed 
Moses led the GMTO for less 
than a year. He left the post 

to deal with family matters, 
according to a 28 July statement 
by the governing board. 

Efforts to build the US$1- 
billion telescope — which 

is scheduled for first light in 
2022 — will be led by Patrick 
McCarthy, an astronomer at 
the Las Campanas Observatory 
in La Serena, Chile, who has 
previously helped to lead the 
GMTO, until a replacement is 
appointed. 


Intelligence chief 
Jason Matheny has been 
made director of the US 
Intelligence Advanced 
Research Projects Activity 
(IARPA), the agency 
announced on 3 August. 
IARPA funds high-risk, 
high-pay- off research for 
the US intelligence agencies. 
Matheny, who previously 
founded biotechnology 
organizations working to 
develop lab-grown meat, has 
directed IARPAs efforts to 
forecast events and scientific 
advances. He will be the 
agency’s third director since 
it was founded in 2006. 


IMPROVING HURRICANE PREDICTION 


NOAA's ability to predict where a hurricane in the Atlantic basin will 
hit in the hours preceding landfall has improved steadily. 
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SEVEN DAYS | THIS WEEK | 


9-13 AUGUST 
Researchers with 

an interest in how 
light can be used to 
probe anything from 
nanoscale sensors to the 
outer cosmos meet in 
San Diego, California, 
at the SPIE Optics + 
Photonics conference. 
go.nature.com/48vevv 


8-12 AUGUST 

The 12th World 
Congress on 
Inflammation 
convenes in Boston, 
Massachusetts. 
Discussions will cover 
the latest research on 
the role of inflammation 
in disease and ways to 
control or halt it. 
inflammation2015.org 


9-13 AUGUST 

How do logic and 
relativity interconnect? 
Find out at the 2nd 
Logic, Relativity and 
Beyond conference in 
Budapest, Hungary. 
go.nature.com/st8s87 


FACILITIES 


Ecology funding cut 
The US National Ecological 
Observatory Network 
(NEON) — a countrywide 
project to measure the effects 
of climate change — will cancel 
its experiment to monitor how 
streams respond to simulated 
environmental stressors. 

The move comes after the 
National Science Foundation 
told programme managers on 
31 July that it was changing the 
scope of the project. NEON has 
a budget of US$433.7-million 
over 5 years. It is constructing 
more than 100 data-collection 
sites around the United States, 
but it has faced charges of 
mismanagement (see Nature 
http://doi.org/6k5; 2014). 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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A pioneering clinical trial in Guinea could provide a model for use in future disease outbreaks. 


INFECTIOUS DISEASE 


Ebola on trial 


Rapid development of an effective vaccine has implications 
for the epidemic in West Africa and for clinical-trial policy. 


BY DECLAN BUTLER, EWEN CALLAWAY & 
ERIKA CHECK HAYDEN 


en Ebola broke out in West Africa 
in December 2013, triggering the 
largest-ever epidemic of the disease, 


there was no vaccine or drug that had been 
shown to be safe and effective in people. Just 
20 months later, a vaccine seems to confer total 
protection against infection, according to the 
preliminary results ofa trial in Guinea that were 
published on 31 July (A. M. Henao-Restrepo 
et al. Lancet http://dx.doi.org/10.1016/S0140- 
6736(15)61117-5; 2015). Nature looks at 
the implications of the trial’s success for the 
ongoing epidemic, which has killed more than 
11,000 people, as well as for how future clinical 
trials are conducted in outbreaks. 


How did the vaccine come about? 

Called rVSV-ZEBOY, it consists of a livestock 
virus that has been genetically engineered to 
masquerade as the Ebola virus (see ‘Masters 
of disguise’). It was developed by the Public 
Health Agency of Canada, licensed to the 
pharmaceutical company Merck and tested 
by an international collaboration of funders, 
scientists, companies, organizations and 
governments, including the World Health 
Organization (WHO). The trial was carried 
out in Guinea, where the epidemic began, 
and used a ‘ring’ design in which contacts of 
infected people — such 
as members of the same 
household — are vacci- 
nated, as are any subse- 
quent contacts of those 


For Nature's special 
on Ebola, see. 
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people. It comprised two groups: one received 
immediate vaccination and the other received 
the vaccine three weeks later. 


What did the trial find? 

Of the 2,014 contacts (of 48 people infected 
with Ebola) who received the vaccine immedi- 
ately, none developed Ebola after a 10-day win- 
dow — enough time for the body to summon 
an immune response to the vaccine and for any 
pre-existing Ebola infections to have revealed 
themselves. (A few people did develop the dis- 
ease between 1 and 10 days after vaccination.) 
By comparison, 16 people out of the 2,380 con- 
tacts (from 42 cases) in the control group 
became infected during this time. The vaccine 
was therefore deemed to have provided 100% 
protection against the virus in this trial. 


100% protection sounds too good to be true. 

It probably is. The study was quite small, so the 
true protection rate may be slightly lower, says 
Marie-Paule Kieny, assistant director-general 
for health systems and innovation at the World 
Health Organization (WHO). An independ- 
ent committee overseeing the trial considered 
the preliminary results so convincing that the 
control group was dropped on 26 July, and all 
contacts are now being vaccinated immediately. 
This will yield more data on the true levels of 
protection. But there is already excitement 
about the vaccine. “The results as reported are 
so striking that even if there are some issues in 
the study, it appears very likely that it’s effec- 
tive, says Jesse Goodman, a former US Food 
and Drug Administration official who is now 
at Georgetown University in Washington DC. 


For how long does the vaccine work? 

That is unknown. The trial was designed to 
test whether ring vaccination could snuff out 
outbreaks, and the several weeks of protection 
that it is known to provide is enough to do this. 
“That's good news for an outbreak situation,” 
says Adrian Hill, director of the Jenner Institute 
at the University of Oxford, UK, who is involved 
in testing a different Ebola vaccine. However, 
he says, it remains to be seen whether the pro- 
tection lasts any longer. “Will it work at six 
months? This trial doesnt tell us that.” Longer- 
term — ideally lifelong — immunity is needed 
for a vaccine to provide sustained protection 
to health workers and other high-risk groups 
during an epidemic, or to mass-vaccinate 
populations should Ebola become endemic. 
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MASTERS OF DISGUISE 


The rVSV-ZEBOV vaccine is made by genetically engineering a weakened 
form of vesicular stomatitis virus (VSV) so that it impersonates the Zaire 
species of Ebola virus, which caused the epidemic in West Africa. 


1. Researchers snip out the RNA 
that codes for the virus’s surface 
glycoprotein (GP), which allows the 
virus to latch onto human cells. 


> Other vaccine trials, including the one that 
Hill is involved in, are testing for longer-term 
protection. But the fall in the number of Ebola 
cases — to 20-30 per week over the past few 
months — means that the trials may struggle to 
provide clear results. 


Could the rVSV-ZEBOV vaccine help to end the 
epidemic in West Africa? 

The vaccine will continue to be used in Guinea 
as part of the clinical trial. Many researchers 
hope that it will be used in Liberia and Sierra 
Leone too, to end the epidemic — although case 
numbers have plummeted, there is a continued 
risk of flare-ups as well as of spread to nearby 
countries (see page 27). However, some regula- 
tory hurdles need to be cleared first. Deploy- 
ment in those nations could occur as part of 
an expanded clinical-trial regime or through 
emergency authorization by regulators, says 


2. They then remove the stretch 
of RNA that codes for the VSV’s 
surface protein and replace it with 
that for the Ebola GP. 
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RNA removed 


3. The resulting vaccine tricks the 
human immune system into 
mounting a response against the 
Zaire Ebola virus. 


Gregory Hartl, a spokesperson for the WHO. 
The authorities there are now considering 
whether the available data are sufficient to 
license the vaccine for use outside a clinical- 
trial setting, a process that could take weeks to 
months, according to the WHO. 


Is it unusual to do a trial during an outbreak? 

Yes. Getting clinical trials approved by regula- 
tors usually takes years, as does conducting the 
gold standard of randomized controlled tri- 
als. That means that outbreaks tend to be over 
before trials can even begin. Clinical trials are 
also usually done in well-equipped research 
hospitals, and quality trials have generally been 
considered impossible to carry out in the often- 
atrocious field conditions of deadly outbreaks 
(see Nature 513, 13-14; 2014). The urgency of 
tackling Ebola changed all that. In September, 
the WHO-supported collaboration pulled out 


all the stops to accelerate testing of treatments 
and vaccines that had shown promise in ani- 
mals. It cut through the red tape and came up 
with trial designs that could quickly provide 
data at least good enough to inform efforts to 
control the outbreak. The rVSV-ZEBOV trial is 
one of several that came about as a result. 


Can the fast-track approach be applied to 
other diseases? 

Hill suggests that vaccines could quickly be 
developed for many other epidemic threats. He 
recommends that research on vaccines against 
such pathogens be accelerated so that clinical 
trials can be done now to test their safety; those 
that pass muster would be stockpiled, ready for 
efficacy tests as soon as an outbreak occurs. 
Pathogens considered priority health threats 
include Marburg virus, which is in the same 
family as Ebola, and the viruses that cause Mid- 
dle East respiratory syndrome (MERS), Lassa 
fever and chikungunya. 


Are lessons likely to be learned from rVSV- 
ZEBOV’s success? 

The hope is that it will provide a model for deal- 
ing with future outbreaks. “This is illustrating 
that it is feasible to develop vaccines much faster 
than we've been doing,” says Hill. And there 
seems to be support for change at the highest 
level. Margaret Chan, director-general of the 
WHO, said on 31 July that the agency is devel- 
oping a “blueprint” for accelerated development 
of measures to counteract potential epidemics. 
The plan aims to reduce the time from the rec- 
ognition of an outbreak to availability of coun- 
termeasures to four months or less, and would 
include putting trial designs and regulatory 
approvals in place in advance of an outbreak. 
“No one wants to see clinicians, doctors, left 
empty-handed ever again,’ said Chan. m 


Cancer-physics project 
accused of losing ambition 


Trailblazers of physical oncology complain that US National Cancer Institute programme 


has lost sight of its mission. 


BY GABRIEL POPKIN 


n ambitious initiative that has deployed 
physics in the fight against cancer since 
2009 has awarded a second round of 


grants. But some pioneers of the field, known 
as physical oncology, protest that the US 
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National Cancer Institute (NCI) has lost sight 
of the programme’ original vision. 

In June, the NCI announced that it would 
give each of four Physical Sciences-Oncology 
Centers (PS-OCs) around US$2 million a 
year for five years. But the funded projects are 
too unambitious to produce major paradigm 
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shifts, argues Robert Austin, a physicist at 
Princeton University in New Jersey who 
helped the NCI to lay the groundwork for the 
programme, and whose centre was not funded 
in the second round. 

The programme is “losing patience with 
those of us who want to understand the 


STEVE GSCHMEISSNER/GETTY 


fundamentals”, says Austin. 

NC officials say that the latest awards, along 
with two rounds of funding planned for later 
this year or next year, show the institute’s con- 
tinuing commitment to the interdisciplinary 
approach. “The fact that this programme is 
renewed, while it’s not in the same original 
form, is still an indication of support,” says 
Larry Nagahara, a former director of the 
programme who left the NCI for Johns 
Hopkins University in Baltimore, Mar- 
yland, this month. Officials insist that 
there has been no move away from 
physics, although the programme ‘f 
also embraces related fields such 5) 
as engineering and applied math- 
ematics. “We're sort of agnostic on 
the spectrum of research that people 
are working on,’ says current pro- > 
gramme head Sean Hanlon. 7 

The PS-OC programme was largely 
the brainchild of Anna Barker, who in 
2007-08, as a deputy director at the NCI, 
set up workshops that helped to lay the pro- 
gramme’ intellectual foundation. She and 
other proponents pointed out that although 
billions of dollars of research investment into 
drugs and therapies have reduced mortality 
for some cancers, they have not produced a 
fundamental understanding of the disease. 
Programme leaders proposed to open a new 
front in the war on cancer by recruiting physi- 
cists to study cancer as a physical rather than 
strictly biological phenomenon. 


A DIFFERENT PERSPECTIVE 

In 2009, the NCI gave grants averaging 
$2.5 million a year for 5 years to 12 centres, 
each co-directed by a physical scientist anda 
cancer biologist. Some researchers attempted 
to re-envision cancer from the bottom up. For 
example, physicist Paul Davies of Arizona State 
University in Tempe, who along with Austin was 
involved in the initial programme workshops 
(see Nature 474, 20-22; 2011), has proposed 
that a cell becomes cancerous when it reverts 
to a primitive evolutionary state. He is investi- 
gating whether ancient genes become activated 
during cancer development (P. C. W. Davies and 
C. H. Lineweaver Phys. Biol. 8, 015001; 2011). 
Austin has explored the evolution of drug resist- 
ance by using microfluidic devices to expose 
tumour cells to chemical gradients (A. Wu et al. 
Proc. Natl Acad. Sci. USA 110, 16103-16108; 
2013), and has suggested that cancer might 
result from environmental stress rather than 
from genetic mutations. 

Others have sought to develop or refine 
mathematical or biophysical tools for cancer 
research. At the Dana-Farber Cancer Insti- 
tute in Boston, Massachusetts, for example, 
researchers have built computer simulations 
to predict which genetic and cellular changes 
are most likely to lead to certain cancers, and 
which treatment approaches are most likely 
to succeed. Other centres have used advanced 
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Cell division and other cancer processes are being studied by physicists looking for fundamental insights. 


microscopy and spectroscopy. Such pro- 
jects are valuable, but do not seek the kind of 
fundamental understanding of cancer that 
is the hallmark of the physics approach, says 
Herbert Levine, a physicist at Rice University 
in Houston, Texas, who studies cancer but has 
not received PS-OC funding. 

The awards announced in June went to 
existing centres at Northwestern University 
in Chicago, Illinois, and Dana-Farber, as well 
as to two new ones — at Columbia University 
in New York City and the University of Penn- 
sylvania in Philadelphia. Neither Austin nor 
Davies had their proposals funded. Those 
decisions may reflect the tangible results 

produced by less 
paradigm-challeng- 
ing projects, Levine 
says. He thinks that 
projects seeking 
fundamental break- 
throughs, such as 
Austin’s, need more 
time to achieve their 
visions. “The lofty goal of helping find a new 
set of directions in biology with the help of 
physicists, computer scientists, whatever — I 
don’t think they quite got there.” 

Barker, who left the NCI in 2010 and is now at 
Arizona State, says that the PS-OCs have made 
progress in a number of areas, including under- 
standing cancer evolution, predicting when a 
cell will become metastatic and developing 
biomarkers for cancer. But she agrees that five 
years was probably too short for the more ambi- 
tious efforts. “For these large consortia, it takes 
about the first three years to get them all work- 
ing together, to get a common language in place, 
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to get common core resources developed,’ she 
says. “In terms of judging the programme, Id 
like to have seen it a couple years hence.’ 

NCI programme managers say that the plan 
was always to reopen the funding competition 
after five years, rather than simply to extend 
existing sites. More researchers applied for the 
second round of funding, they say, and there 
was not enough money for everyone. But they 
point out that physical oncologists now have 
more funding options. “I think most people 
will find somewhere to have their work sup- 
ported,’ says Hanlon, whether through future 
PS-OC awards, other NCI programmes or 
external sources. 

Levine, for example, has funding from the 
state of Texas and has been involved in a part- 
nership between the US National Science Foun- 
dation and private donors. The Francis Crick 
Institute, set to open this year in London, prom- 
ises to bring more physicists into biomedical 
research (see Nature 509, 544-545; 2014). Aus- 
tin and Davies say they may look overseas or 
to private foundations to continue their work. 

NCI programme managers say that the 
diversification of funding sources shows that 
the field is gaining support and recognition. 
They also point to the journal Convergent 
Science Physical Oncology, launched in June by 
IOP Publishing of Bristol, UK, and to stand- 
ing sessions on physics and the evolution of 
cancer at the American Physical Society’s 
annual March meeting and at meetings of the 
American Association for Cancer Research. 
“Those types of sessions didn't exist five years 
ago — now you can find them at several of 
these meetings,” says Nagahara. “That's a sign 
of success.” m 
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Malaria-carrying mosquitoes (Anopheles gambiae) are a prime target for gene-editing techniques. 


BIOTECHNOLOGY 


Caution urged over 
DNA editing in wild 


Method for rapidly altering gene pools could harm ecosystems. 


BY HEIDI LEDFORD 


C C rap!” That was the first word out of 

( Kevin Esvelt’s mouth as he scanned 

a paper’ published in Science last 

March. The work described the use of a gene- 

editing technique to insert a mutation into 

fruit flies that would be passed on to almost 

all of their offspring. Although intriguing, the 

report made Esvelt feel uneasy: if engineered 

flies escaped from a lab, the mutation could 
spread quickly through a wild population. 

But that was exactly what exhilarated 
molecular biologist Anthony James at the Uni- 
versity of California, Irvine. “Holy mackerel!” 
he wrote to the study’s authors. “Can we use it 
in mosquitoes?” 

On 30 July, the US National Academy of 
Sciences, Engineering, and Medicine (NAS) 
held the first in a series of meetings meant to 
find ways to balance the promise and perils of 
the technique, called ‘gene drive. The method 
can rapidly modify not just a single organism 
but a whole population, by inserting a desired 
genetic modification into an organism along 
with DNA that increases the rate at which the 
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change is passed to the next generation. The 
technique could be used to render mosquitoes 
unable to carry malaria parasites or to wipe 
out harmful invasive species, but it could also 
have unanticipated environmental costs and 
might be impossible to reverse. “Once this is 
out there, you cannot call it back,” says Wal- 
ter Tabachnick, a population geneticist at the 
University of Florida in Vero Beach. 
The idea of gene 


drivehasbeenaround “How doyoutest 
for more thanadec- suchasystem, 
ade’. Butits practical- and how doyou 
ity was givenahuge doit safely?” 


boost around three 

years ago with the arrival of CRISPR, a gene- 
editing technique that allows precise changes 
to an organism’s DNA’. 

The Science paper’, by developmental biolo- 
gist Ethan Bier and his student Valentino Gantz 
at the University of California, San Diego, used 
CRISPR to insert a modification into genes on 
both chromosomes in a pair, so that when the 
flies bred, they would pass the modification on 
to practically all of their offspring. 

The work came out of a desire to develop 
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a system that would make it easier to study 
genetic changes in organisms that are difficult 
to breed in the laboratory. Because CRISPR has 
been shown to work in a wide range of creatures, 
researchers hope one day to be able to engineer 
wild populations in much the same way. 


CALL FOR CONCERN 

Mindful of both the potential and the risks, 
Esvelt, a bioengineer at Harvard Medical 
School in Boston, Massachusetts, brought 
together a group of scientists to write a Com- 
ment in Science’, published last week, laying 
out the need for multiple containment strat- 
egies for gene-drive research that is done in 
the laboratory. Meanwhile, the NAS meet- 
ing marks the start of a 15-month search for 
ways to minimize the risk in advance of field 
releases. Because no one is known to have 
made CRISPR work in mosquitoes — the 
mostly likely organism for the application of 
the technology — the committee has some 
time to do its work. 

But there is still urgency, noted Todd 
Kuiken, who explores the interface of science 
and policy at the Wilson Center, a think tank in 
Washington DC. CRISPR gene-drive technol- 
ogy is developing at a breakneck pace, and has 
the potential to dramatically alter ecosystems 
in unexpected ways. At the meeting, Kuiken 
used the invasion of Asian carp into some 
US lakes as an example of how little is known 
about some wild ecosystems. “While this is an 
invasive species, it’s also an established spe- 
cies,” he says. “I don’t think we have a good 
understanding of how we evaluate what hap- 
pens when we remove a species from as large 
an ecosystem such as this.” 

Meanwhile, Esvelt and his colleagues are 
studying the CRISPR gene-drive system in 
the nematode Caenorhabditis elegans to learn 
more about what happens to a population as 
engineered DNA is passed down through gen- 
erations, accumulating mutations as it goes. 
They are also testing ways to make sure that a 
gene drive can be countermanded once it has 
been set loose. 

These issues need immediate attention, says 
geneticist Daniel Wattendorf at the US Defense 
Advanced Research Projects Agency (DARPA) 
in Arlington, Virginia. Security concerns may 
mean that DARPA needs to start working on 
the technology before guidelines are drawn up, 
he adds. 

And Tabachnick remains concerned that 
these preparations may not suffice. “How do you 
test such a system, and how do you do it safely?” 
he asks. “I'm not convinced that any of this work 
could ever possibly provide the assurance of 
safety that one might demand.” mSEEEDITORIALP.S 
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Scrap of brain 
seen in full detail 


Mouse map is step towards reconstructing human brain. 


BY ALISON ABBOTT 


ix years might seem like a long time to 

spend piecing together the structure ofa 

speck of tissue vastly smaller than a bead 
of sweat. But that is how long it took a team 
led by cell biologist Jeff Lichtman at Harvard 
University in Cambridge, Massachusetts, to 
create the first complete reconstruction 
of a piece of tissue in the mammalian 
neocortex. 

The reconstruction (N. Kasthuri 
et al. Cell 162, 648-661; 2015) is 
essentially a 3D digital map that 
allows biologists to see the detail 
and relative positions of every 
individual cell part in a piece of 
tissue measuring 1,500 cubic 
micrometres, helping to reveal 
how the brain works. 

It is a far cry from reconstructing all of the 
100 billion or so cells that make up the entire 
human brain, which is one of neuroscientists’ 
ultimate goals. But Christof Koch, president of 
the Allen Institute for Brain Science in Seattle, 
Washington, notes that the various technolo- 
gies involved will speed up “tremendously” 
over the next decade: “I would call this a very 
exciting promissory note.” 

Lichtman’s team has its sights set on a further 
challenge: reconstructing a cubic millimetre of 
rodent neocortex — a piece of tissue more than 
600,000 times larger than the present achieve- 
ment. The researchers will do that work as part 
of a consortium that in July received prelimi- 
nary approval for funding by the US govern- 
ment agency IARPA (Intelligence Advanced 
Research Projects Activity), which promotes 
high-risk, high-pay-off research. 

The neocortex is the most recently evolved 
brain region, and is of particular interest to 
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neuroscientists. As with other brain areas, 
its function is determined by how individual 
neurons are connected to each other through 
structures called synapses. These structures, 
which can be seen only with an electron micro- 
scope, allow chemical or electrical signals to pass 
between cells and can be pruned or created anew 
as an animal attunes itself to its environment. 


Individual cell parts are visible in this 3D map ofa 
tiny piece of mouse brain. 


Reconstructing this level of detail required 
a multistep procedure. A diamond blade 
shaved a region of mouse neocortex called the 
somatosensory cortex into several thousand 
slices, which were continuously rolled onto 
a single long strip of special plastic tape at a 
rate of 1,000 sections every 24 hours. The sec- 
tions were then imaged with a scanning elec- 
tron microscope powerful enough to capture 
even the tiny vesicles that contain the chemical 
signalling molecules in synapses, known as 
neurotransmitters. 

To reconstruct the scrap of tissue, the team 
homed in at the highest resolution around the 
finger-like dendrites of two neighbouring neu- 
rons. The researchers aligned the relevant digi- 
tal images so that the parts of each cell in each 
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slice coincided with their positions on adjacent 
slices. To follow the individual cells through the 
different slices, they developed computer pro- 
grams to assign a particular colour to every cell 
and to trace each one, either automatically or 
with input from researchers. 

The volume of tissue used was too small 
to contain an entire cell, but large enough to 
contain fragments of more than 1,600 neurons 
and of other brain cells of at least six different 
types, as well as around 1,700 synapses. 

One feature revealed by this reconstruction, 
which is now freely available to the scientific 
community, was that one neuron does not form 
synapses with another neuron just because the 
two happen to be physically close to each other, 
as some neuroscientists had assumed. Instead, 
the cells have clear preferences for particular 
neighbours. This had already been observed 
in the retina and in the hippocampus, both of 
which are evolutionarily older than the neo- 
cortex. The answer to what confers these pref- 
erences may be found in ongoing studies 
to identify the molecular components 

of each synapse, says neuroscientist 

Seth Grant at the University of 
Edinburgh, UK. 

The Lichtman team is now 
working on similarly sized 
reconstructions of the cortical 
tissue from six-day-old mice, 

to see whether synapses behave 
the same way in an earlier stage of 
development, and on reconstructing a piece of 
human brain acquired during surgery. 

As well as improving our understanding of 
the brain, such reconstructions could inspire 
new methods of computing. The consortium 
that is currently negotiating with [ARPA is 
based at Harvard and at the Massachusetts 
Institute of Technology (MIT) in Cambridge 
and consists of 13 labs. Under the preliminary 
contract, the consortium would be part of 
IARPAss Machine Intelligence from Cortical 
Networks (MICrONS) programme and would 
receive tens of millions of dollars over five 
years, says MICrONS head Jacob Vogelstein. 
The general goal of the programme, he says, 
“is to revolutionize machine learning by 
reverse-engineering from codes discovered 
in the brain” He adds: “IARPA also invests in 
neuroscience because we are interested as well 
in understanding cognition — how people 
behave and make decisions.”= 
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Stanene makes 
its debut 


Graphene’s tin cousin may 
conduct without heat loss. 


BY CHRIS CESARE 


wo years after physicists pre- 
| dicted that tin should be able to 
form a mesh just one atom thick, 
researchers report that they have made 
it’. The thin film is called stanene (from 
the Latin stannum meaning tin, which 
also gives the element its chemical sym- 
bol, Sn) and is the latest cousin of gra- 
phene, the honeycomb lattice of carbon 
atoms that has spurred thousands of stud- 
ies into related 2D materials (see Nature 
522, 274-276; 2015). 

In theory, stanene has a talent that 
graphene does not: at room temperature, 
electrons should be able to travel along 
the edges of the tin mesh without col- 
liding with other electrons and atoms as 
they do in most materials. This makes 
the film what physicists call a topologi- 
cal insulator, and means that it should be 
able to conduct electricity without losing 
energy as waste heat, according to pre- 
dictions’ made in 2013 by Shou-Cheng 
Zhang, a physicist at Stanford University 
in California, who is a co-author of the 
latest study. 

A thin film of stanene might be the 
perfect highway along which to ferry cur- 
rent in electric circuits, says Peide Ye, a 
physicist and electrical engineer at Pur- 
due University in West Lafayette, Indiana. 
“Tm always looking for something not 
only scientifically interesting but that has 
potential for applications in a device,’ he 
says. “It’s very interesting work.” 

But Zhang and his colleagues at four 
universities in China cannot yet confirm 
stanene’s predicted exotic properties. 
They created the mesh by vaporizing tin 
in a vacuum and allowing the atoms to 
waft onto a supporting surface made of 
bismuth telluride. Although this surface 
allows 2D stanene crystals to form, it also 
interacts with them, creating the wrong 
conditions for a topological insulator, 
says Zhang. He has already co-authored 
another paper® examining which surfaces 
would work better. m 


1. Zhu, F. F. et al. Nature Mater. http://dx.doi. 
org/10.1038/nmat4384 (2015). 

2. Xu, Y. et al. Phys. Rev. Lett. 111, 136804 (2013). 

3. Xu, Y., Tang, P. & Zhang, S.-C. Preprint at http:// 
arxiv.org/abs/1507.00419 (2015). 
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Marine snails from the US West Coast show signs of shell weakening as a result of ocean acidification. 


OCEAN ACIDIFICATION 


Seawater studies 
come up short 


Experiments fail to predict size of acidification’s impact. 


BY DANIEL CRESSEY 


s the oceans’ chemistry is altered by 
At levels of atmospheric carbon 

dioxide, the response of sea-dwellers 
such as fish, shellfish and corals is a huge 
unknown that has implications for fisheries 
and conservationists alike. But the researchers 
attempting to find an answer are often failing 
to properly design and report their experi- 
ments, according to an analysis of two decades 
of literature. 

Oceans absorb much of the CO, emitted by 
human activities such as coal burning. This 
leads to a variety of chemical changes, such as 
making waters more acidic, which are referred 
to as ocean acidification. 

The United Nations has warned that ocean 
acidification could cost the global economy 
US$1 trillion per year by the end of the century, 
owing to losses in industries such as fisheries 
and tourism. Oyster fisheries in the United 
States are estimated to have already lost millions 
of dollars asa result of poor harvests, which can 
be partly blamed on ocean acidification. 
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The past decade has seen accelerated 
attempts to predict what these changes in 
pH will mean for the oceans’ denizens — in 
particular, through experiments that place 
organisms in water tanks that mimic future 
ocean-chemistry scenarios. 

Yet according to a survey published last 
month by marine scientist Christopher 
Cornwall, who studies ocean acidification 
at the University of Western Australia in 
Crawley, and ecologist Catriona Hurd of the 
University of Tasmania in Hobart, Australia, 
most reports of such laboratory experiments 
either used inappropriate methods or did not 
report their methods properly (C. E. Cornwall 
and C. L. Hurd ICES J. Mar. Sci. http://dx.doi. 
org/10.1093/icesjms/fsv118; 2015). 

Cornwall says that the “overwhelming evi- 
dence” from such studies of the negative effects 
of ocean acidification still stands. For exam- 
ple, more-acidic waters slow the growth and 
worsen the health of many species that build 
structures such as shells from calcium carbon- 
ate. But the pair’s discovery that many of the 
experiments are problematic makes it difficult 


NOAA 


to assess accurately the magnitude of effects 
of ocean acidification, and to combine results 
from individual experiments to build overall 
predictions for how the ecosystem as a whole 
will behave, he says. 

The survey, published in the journal ICES 
Journal of Marine Science, was based ona search 
of the Scopus database of research papers. 
Cornwall and Hurd analysed 465 studies pub- 
lished between 1993 and 2014 that manipulated 
seawater chemistry and found that experiments 
often failed to implement widely accepted 
measures to ensure quality. 

For instance, to ensure robustness, manip- 
ulation studies should use multiple arrays of 
independent ocean-mimicking tanks. And in 
experiments that compare sea animals under 
acidified conditions with controls, these tanks 
should be randomized to remove bias. But the 
pair found that in several papers, researchers 
used one main seawater tank to supply mul- 
tiple, supposedly independent smaller tanks. 


CHEMICAL ERRORS 

The researchers also found mistakes in basic 
chemistry: some authors simply added acid 
to a tank and ignored other chemical changes 
that result from the absorption of CO,, such as 
increased levels of carbonates. Although the fre- 
quency of these chemistry errors has dropped 
since the 2010 publication of an international 


‘best practice’ guide for ocean-acidification 
experiments (see go.nature.com/sp5kgn), the 
researchers found no evidence for improve- 
ments in the design of tank arrays. 

Bayden Russell, an ocean-acidification 
researcher at the University of Hong Kong 
who reviewed drafts of the latest paper, has also 
noticed that some researchers fail to take into 
account the complexities of ocean acidifica- 

tion when designing 


“Truly rigorous their experiments. “It 
designs are is these complexities 
logistically that will drive eco- 
complex and system responses to 


ocean acidification, 
he says. 

Overall, Cornwall and Hurd found that in 
only 27 cases could they be certain that an 
appropriate experimental design had been 
used, and in 278 cases, the design was clearly 
inappropriate. The remaining studies had 
insufficient detail on experimental set-up — a 
problem in itself, note the researchers. 

The pair present a series of recommen- 
dations for well-designed experiments and 
suggest a checklist of details that should be 
included in papers to allow replication of 
experiments, including which chemicals were 
used to manipulate the seawater chemistry and 
the configuration of tank arrays. 

Ove Hoegh-Guldberg, director of the 


expensive.” 
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Global Change Institute at the University of 
Queensland in St Lucia, Australia, suggests 
that researchers also need to take account of 
natural variations in temperature and CO, in 
experimental set-ups and ensure that experi- 
ments that manipulate acidity also simulate the 
accompanying rise in temperature from global 
warming, which many do not. 


PRESSURE TO PUBLISH 
Russell thinks that most research groups are 
now trying to use appropriate designs, but says 
that there are still problems, which he attrib- 
utes to a variety of factors. “Unfortunately, 
truly rigorous designs are logistically complex 
and expensive in both set-up costs and ongo- 
ing maintenance time,’ he says. “When super- 
imposed on the increasing pressure to publish 
rapidly, and in top journals, some researchers or 
research groups are still attempting to publish 
what I would consider sub-standard research” 

Jonathan Havenhand, who works on marine 
invertebrates at the University of Gothenburg 
in Sweden and who co-authored the 2010 
guide, welcomes the latest paper: “Everybody 
should know the stuff that’s in Cornwall and 
Hurd’s paper. It’s good that they wrote it. It’s 
disappointing that they had to.” 

Havenhand suspects the paper will be highly 
cited. “Whether people are going to be happy 
to be citing it, I don't know.” = 
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CHEMISTRY 


BY X1AOZHI LIM 


Slow, solid-state reactions used by lichens 
and Renaissance pigment-makers could 
help to make chemistry greener. 


ristina Mottillo is in no rush. She pours finely ground white 

powder into a Petri dish, carefully rolls it flat with the side of a 

small glass vial, then seals it into a chamber where the heat and 

humidity are like those on a sweltering summer day in the tropics. 
“Now, she says, “we wait.” 

Over the next four days, with no further effort from Mottillo, the three 
chemicals in that powder will gradually turn into ZIF-8: a stable, porous 
compound called a metal-organic framework that could find widespread 
use in carbon capture and storage, and that is worth more than 100 times 
the raw materials’ original value. “The reactants do all the work,” says Mot- 
tillo, a chemistry PhD student at McGill University in Montreal, Canada. 

This isa radical departure from standard chemical-synthesis methods, 
which typically involve dissolving, heating and stirring ingredients in a 
solution to encourage them to react quickly. These techniques are fast and 
well understood, but they tend to consume large amounts of chemicals 
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Lichens make acids that 
react with solid rock in 
processes that chemists 
are trying to mimic. 


and energy, and pose a major environmental 
challenge. An estimated 50-80% of all chemi- 
cal waste produced by industry and university 
labs consists of solvents left over from synthesis, 
separation and purification. 

For around two decades, a worldwide ‘green chemistry movement has 
been trying to find ways to minimize these toxic waste streams. But Mot- 
tillo is one of a handful of scientists starting to adopt an approach that is 
radical even by the movement's standards. Her PhD supervisor, McGill 
chemist Tomislav Frisci¢, describes it as “lazy man’s chemistry”: let a mix 
of solid reactants sit around undisturbed while they spontaneously trans- 
form themselves. More properly called slow chemistry, or even just ageing, 
the approach requires few, if any, hazardous solvents and uses minimal 
energy. If planned properly, it also consumes all the reagents in the mix, 
so that there is no waste and no need for chemical-intensive purification. 

Such processes have been known for millennia: rusting iron is a familiar 
example, as is the decades-long weathering process that produced the 
Statue of Liberty’s green patina. But only now are scientists starting to 
understand these processes and learn how to control them to obtain the 
products they want. In the past decade, research groups have used such 
techniques to produce valuable products, including organometallic com- 
plexes, pharmaceuticals, simple organic compounds and photolumines- 
cent materials. Proponents such as Frisci¢ are hoping to make many more. 

“The ultimate goal,” he says, “is really to clean up the chemical 
manufacturing industry.” 


SLOW AND STEADY 

That could take a while: even the most ardent advocates agree that 
ageing faces an uphill struggle for credibility. Students are taught that 
good chemistry frequently starts with the right solvent: molecules in 
solution can react much faster than they otherwise would, because they 
are free to tumble and collide with one another, which facilitates the 
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making and breaking of bonds. But slow chemistry happens in solids, 
where, by definition, everything is held rigidly in place. “People tend to 
think ofa stone asa grave for molecules,” says Dario Braga, a solid-state 
chemist at the University of Bologna, Italy. 

But that is not true. Solid-state reactions can take months or years, 
but they do exist in nature. In Western Australia, deposits of bird guano 
reacting with copper sulfide minerals in rocks have formed moolooite: 
an uncommon green copper oxalate mineral. Lichens living on rock 
often secrete a mixture of simple, weak organic acids that slowly react 
with minerals to produce complex metal-organic materials, which give 
the lichens some protection from invasive microorganisms. 

Well into the nineteenth century, ageing was used to produce lead 
white, a pigment that is among the most widely used in art history. Manu- 
facturers placed rolled-up sheets of lead over buckets containing a small 
amount of vinegar, and then left the buckets on a bed of manure ina shed. 
The metal would slowly react with water vapour in the air and carbon 
dioxide from the manure, turning into a white material now known to 
be a mixture of lead carbonate and lead hydroxide. The vinegar acted as 
acatalyst, and the decomposing manure kept the shed warm enough for 
the process to proceed at a reasonable rate. After about three months, the 
pigment was scraped off, washed and ground into a fine powder. It was 
used in paintings such as Leonardo da Vinci’s Mona Lisa (around 1506) 
and Johannes Vermeer’s Girl with a Pearl Earring (1665). 

But the recent surge in slow chemistry has nothing to do with art. One 
factor has been interest from the pharmaceutical industry, which would 
like better control over the ageing processes that can slowly degrade drugs 
in pill form (see ‘Slow, slower, slowest’). Another is that solid-state chem- 
istry is no longer the mystery it once was. Reactions in solids tend to be 
much more complex than those in liquids, where molecules quickly dif- 
fuse into a uniform mixture. Solids are often poorly mixed agglomerations 
of very different particles, and are riven with cracks and other structural 
defects, where chemical reactions can take place in different ways and at 
different rates. But rapid improvements in imaging techniques such as 
X-ray crystallography, nuclear magnetic resonance scanning and electron 
microscopy are now giving chemists a better understanding of how those 
reactions proceed in real time, and what they eventually produce. 

Such insights, in turn, have helped proponents to streamline and 
improve on natural ageing processes, while countering the perception 
that ageing is too slow and unpredictable to be of practical use. “It’s not 
slow if you plan in advance,’ insists Fri8ci¢, whose group is trying to better 
understand and exploit ageing reactions. Mottillo's experiments in the 
green synthesis of metal-organic frameworks, for example, are an attempt 
to accelerate the chemistry between minerals and lichen acids. 


PRACTICAL MAGIC 
Another student in Frisci¢’s group has used a different ageing process to 
synthesize various metal-organic materials from oxides of main-group 
metals, transition metals and lanthanides — solids that tend to have very 
high melting points and low solubility. The researchers found that each 
metal oxide ages at a different speed’, so they have patented this as a way 
to isolate the metals from one another: the ageing products are less dense 
than the oxides, so they will float in an intermediate-density liquid while 
the remaining oxides sink. Metal oxides, says Frisci¢, are ideal reagents 
because they are cheap, safe, widely available and produce only water as a 
by-product. Other metal salts, such as chlorides or nitrates, produce acids 
that end up as toxic waste. Furthermore, many metals occur naturally as 
oxides that have to be leached from ore with strong acids; with ageing, 
says Fri8ci¢, one could bypass that step, and make valuable metal-organic 
frameworks directly from rocks. He and his team are working on scaling 
this process up to bring it to the metal extraction and separation industry. 
As for speed, says Fris¢ic, “we can get reactivity going if we use a few 
tricks” — most of them quite straightforward. One is to put the samples 
in a humid atmosphere: water vapour can migrate through holes in the 
solid structures, acting as a lubricant to help atoms or molecules inside 
the solid to diffuse, react or even rearrange into new structures. 
Another technique is to increase the temperature to, say, 45°C — a far 
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Ageing reactions 
happen on a wide 
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1DAY 


@ The antibiotic 
clarithromycin 
changes phase in 
carbon dioxide. 


@ Synthesis of 
milligrams of small, 
iron-based 
organometallic 
compounds. 


4DAYS 


@ Synthesis of 

10 grams of zinc- 
or cobalt-based 
metal-organic 
frameworks. 


1 WEEK 


e Synthesis of 
milligrams of 
copper-based 
luminescent polymer. 


1 MONTH 

e Aspirin degrades 
at 60 °C and 90% 
humidity. 


3 MONTHS 


e Lead-white pigment 
produced from lead. 


1 YEAR 


e Lichen acids etch 
rock to a depth of 
0.3-30 micrometres. 


20 YEARS 


© Green patina 
covers the Statue 

of Liberty as a result 
of an oxidation 
reaction between 
copper, oxygen and 
water vapour. 


FEATURE | NEWS 


cry from the hundreds of degrees typical of 
industrial reaction vessels, but enough to make 
the ageing process run faster. “If we were living 
in India, we could potentially do it outside,” 
says Mottillo. And a third trick is to do what 
the lichens cannot, and grind the reactants 
together into a fine, homogeneous mixture to 
increase the surface area of the particles, where 
they touch each other and can react. That is 
how Mottillo was able to complete her ZIF-8 
synthesis in days instead of weeks’. 

Braga and his group have used ageing, or 
vapour digestion as they call it, to make a 
variety of materials by exposing solid reac- 
tants in a vial to solvent vapour. For example, 
by letting solid copper(I) iodide sit with an 
organic compound for about a week in water, 
acetonitrile or toluene vapour, they obtained 
three new copper-based polymers that glow 
after exposure to ultraviolet light®. Such 
compounds could be used in light-emitting 
diodes and screen displays. But even more 
important, says Braga, is that copper(I) iodide 
is notoriously difficult to dissolve in common 
solvents; vapour digestion offers a way to 
make it and other insoluble materials more 
accessible to chemistry. 

Braga and Frisci¢ are still not sure exactly 
what is happening in these reactions. But 
Dominik Cin¢i¢, an assistant professor at the 
University of Zagreb, is applying their method 
to organic synthesis, a branch of chemistry that 
conventionally relies on energy- and chemical- 
intensive methods. Cincic’s group has demon- 
strated’ that vapour digestion can be used to 
synthesize Schiff bases, small organic mol- 
ecules containing a carbon-nitrogen double 
bond. The team’s next goal is to use the method 
in a one-step synthesis of amines: nitrogen- 
containing organic compounds that are used in 
many dyes and drugs, and that typically require 
two or three synthesis steps. 

Everyone working on ageing-based synthe- 
sis concedes that there is a long way to go. The 
mechanisms are not yet well understood and 
there are no good computational models to 


speed up research. Furthermore, sceptics doubt that the chemical indus- 
try can ever do without solvents entirely. Walter Leitner, a green chemist 
at RWTH Aachen University in Germany, points out that ageing research 
has had the most success in inorganic synthesis — which has historically 
had a much smaller environmental impact than organic synthesis, where 
the most solvents are used. In organic synthesis, he says, the most practical 
target a green chemist can aim for is to find ways to replace toxic solvents 
with environmentally benign ones such as water. 

Still, such objections have not discouraged Friscic. “Everything you 
can do in solution, you can do with ageing, and more,’ he declares. At the 
moment, he is exploring the mechanisms behind ageing by monitoring 
reactions as they happen. 

“All that one needs to do,” he says, “is to explore” m 


XiaoZhi Lim is a freelance writer in Singapore. 
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THE NEAL TIME 


THE WORLD IS ILL-PREPARED FOR THENEXT EPIDEMIC Of 
PANDEMIC, BUT THE HORROR OF THE EBOLA OUTBREAK IN WEST 
AFRICA MAY DRIVE CHANGE 


By Declan Butler 
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f there was one point last year when public-health experts held 

their breath, it was when a Liberian man infected with Ebola virus 

flew to Lagos, Nigeria, in July. Ebola was already raging uncon- 

trolled through impoverished countries in West Africa, killing half 

of those it infected. Now a vomiting man had carried it straight 
to the heart of Africa’s largest megacity — with 21 million inhabitants, 
many of whom live in slums. Experts were horrified at the prospect that 
the virus might rip through the city — and then, because Lagos is an 
international travel hub, spread farther afield. 

“The last thing anyone in the world wants to hear is the two words, 
‘Ebola and ‘Lagos’ in the same sentence,’ said Jeffrey Hawkins, the 
US consul general in Nigeria, at the time. 

In the end, this apocalyptic scenario did not play out. Because Nigeria 
is a focus of global efforts to eradicate polio, it has a decent infrastruc- 
ture of virology labs and epidemiologists and the capacity to run large 
public-awareness campaigns. Authorities quickly repurposed this tool- 
box to tackle Ebola, and the outbreak was contained with just 20 cases 
in all. The number of infections from Ebola in 
Guinea, Liberia, and Sierra Leone has dropped 
from its peak of hundreds of cases per week, 
to 20 or 30. But what has not faded is the fear 
that, at some point in the future, the world 
will face an outbreak of a deadly disease that 
spreads much more easily between people than 
Ebola does, and so results in an epidemic or 
pandemic that is even more terrible than that 
in West Africa. 

Quite what that disease will be, no one knows. One worst-case sce- 
nario is that of an influenza virus as deadly as the one behind the 1918 
pandemic, which raced across the world killing as many as 50 million 
people. Other virus families also keep researchers awake. Poxviruses are 
one: smallpox was eradicated in 1980 after killing some 300 million peo- 
ple in the twentieth century, but there are many animal poxviruses that 
could evolve to replace it. Paramyxoviruses are another major worry: the 
family includes Nipah virus and Hendra virus, both of which have trig- 
gered small outbreaks that caused serious illness and death. But uncer- 
tainty prevails. “Second on the list is the one we haven't thought of, and 
at the very top is the one we can't imagine,” says 
infectious-disease specialist David Morens at 
the US National Institute of Allergy and Infec- 
tious Diseases in Bethesda, Maryland. 

The Ebola epidemic has spurred researchers 
and public-health experts to call for a major 
overhaul of the world’s approach to epidemic 
threats. What's needed, they argue, is better monitoring for the emer- 
gence and re-emergence of pathogens, and beefed-up health systems 
in the many poor countries that are often on the frontline of epidemics. 
They want to see nimble task forces that are able to respond rapidly and 
forcefully to outbreaks, and a multibillion-dollar global fund to quickly 
develop countermeasures such as drugs and vaccines. 

At the same time, the risks need to be kept in perspective, say research- 
ers. History shows that new pathogens that pose a large epidemic threat 
are “very rare’, says Adrian Hill, a specialist in infectious diseases and 
director of the Jenner Institute in Oxford, UK. So are those that quickly 
kill many of those infected — the type that film plots thrive on. Many 
emerging epidemics, such as that of multidrug-resistant tuberculosis, 
move more slowly, yet cumulatively can kill many more people than the 
acute outbreaks that attract most of the media and political attention. 
But when they happen, large, acute epidemics can cause devastating loss 
of life and major economic damage, and the panic and chaos they gener- 
ate can do more harm than the pathogen itself. The Ebola epidemic is 
not over, and there are concerns it could spike again. 

“Ebola has been a wake-up call, not just for Africa, but for the 
world,’ said Margaret Chan, director-general of the World Health 
Organization (WHO) in March. “The world must never again find 
itself in such a position” 


Graves dug in 
Freetown, Sierra 
Leone, to cope with 
those dying from Ebola 
in late 2014. 
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The greatest new epidemic threats are unknown pathogens that 
spread easily — for example, through the air — and to which humans 
have little or no immunity. The world’s last brush with anything coming 
close was in late 2002, when the virus causing severe acute respiratory 
syndrome (SARS) caused a outbreak in humans in Guangdong prov- 
ince, China, then quickly fanned out into 29 countries — infecting at 
least 8,098 people and killing 774 of them — before a massive interna- 
tional response brought it under control. If that virus had spread just a 
bit more easily, it might have killed many more. “SARS probably came 
close to becoming an out-of-control pandemic,’ says Morens. “I think 
of SARS as one of our scariest close calls.” 


HOW TO DETECT THREATS 

Like SARS, which is thought to have originated in bats, most future 
infectious diseases will come from animals; some three-quarters of new 
human diseases have emerged this way. Scientists suspect that the cur- 
rent Ebola outbreak originated when the virus passed from fruit bats to a 


“SARS PROBABLY CAME CLOSE TO BECOMING AN 
OUT-OF-CONTROL PANDEMIC. | THINK OF SARS AS 
ONE OF OUR SCARIEST CLOSE CALLS,” 


two-year-old boy playing in a forested region of southern Guinea; Mid- 
dle East respiratory syndrome (MERS), a viral disease that emerged in 
2012, is probably transmitted by camels. And just last month, research- 
ers reported that three squirrel breeders in Germany who had died of 
encephalitis were killed by a novel bornavirus that had been carried by 
the animals. 

In theory, this knowledge could help the world to prepare. Scientists 
could carefully monitor viruses in animal populations and in people living 
nearby to identify potential threats, such as any that show some ability to 
cross the species barrier. Such basic research might allow scientists to get a 
head start on developing vaccines and drugs. But the science of predicting 
such threats is in its infancy. Scientists know little about what allows an 
animal pathogen to infect humans or to then spread between them, pro- 
cesses that depend on many factors, including its ability to enter human 
cells and replicate there. “Of all our gaps in knowledge, the worst gap is 
how little we know about the mechanisms of emergence,’ says Morens. 

To make matters worse, the vast majority of infectious-disease research 
and surveillance is in developed countries, but most emerging and 
re-emerging diseases are in the developing world. “We need to be where 
the diseases are, and where they are likely to emerge, studying them at 
their source, not sitting in labs in US science buildings,” says Morens, 
who is currently working on Ebola in Guinea. 

Robert Garry, a virologist at Tulane University in New Orleans, Loui- 
siana, is working with African scientists in an international project — the 
African Center of Excellence for Genomics of Infectious Diseases, based 
at Redeemer’s University in Redemption City, Nigeria. The project, 
which began in May last year, is taking blood samples from villagers in 
the region who have fevers, and using next-generation genetic sequenc- 
ing of the samples to discover new pathogens, as well as developing diag- 
nostics for both new and known ones. Supported by the US National 
Institutes of Health and the World Bank, it has an initial four-year budget 
of around US$8 million. 

Researchers do have some clues to guide their search for threats. They 
know that factors such as geography, climate and culture can help to 
identify hotspots of disease emergence, with most at lower latitudes. And 
it is clear that a major driver is contact between animals and humans. 
The EcoHealth Alliance, an international network of scientists centred 
in New York City, and the US Agency for International Development’s 
Emerging Pandemic Threats programme are carrying out viral sampling 
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from animals and people in hotspots across the world, and trying to tease 
out how farming, trade, deforestation and hunting and consumption of 
bushmeat influence the emergence of diseases. 

Such projects have led to the discovery of hundreds of viruses includ- 
ing arenaviruses, phleboviruses, coronaviruses and rhabdoviruses — and 
are likely to yield many more in the future, says Garry. But even when 
researchers do find new viruses, it is difficult to say which of them might 
pose a major threat. Few people would have anticipated that HIV/AIDS, 
the world’s largest recent pandemic, would be caused by a retrovirus, 
part ofa viral family that had not previously been associated with major 
infectious disease, Garry says (see ‘Emerging threats’). 

Some hints can be found by examining the affinity of viruses for 
receptors on human cells and assessing how well they spread between 
animals in the lab. These approaches are perhaps most advanced for flu 
viruses, which cause pandemics every few decades, of varying severity. 
Researchers around the world try to rank the potential pandemic risk 
of flu viruses using a battery of criteria, including the pathogens ability 
to infect or transmit between ferrets, whether they can bind to human 
receptors, and to what extent the human population has immunity. This 
information is used to prioritize the development of vaccines against 
those that seem more threatening. But it cannot predict which flu viruses 
might go pandemic. 

Researchers know that more could and should be done. One of the 
most important tasks is to establish local medical and research systems 
that can quickly analyse what is going on when a cluster of people sud- 
denly comes down with serious disease. Such systems, which are often 
underdeveloped in poorer countries, require a trained local workforce of 
microbiologists, epidemiologists and clinical scientists, and diagnostics 
laboratories capable of testing clinical samples for a wide range of dis- 
eases. These could be implemented in a low-income country for as little 
as $12 million annually, according to Jeremy Farrar, director of the UK 
biomedical charity the Wellcome Trust, who helped to establish such a 
system in Vietnam. 

But right now, surveillance systems are just as limited as scientists’ 
knowledge of emerging threats. So the current reality is that we will prob- 
ably be alerted to the next human epidemic or pandemic only once it is 
well under way. 


HOW TO RESPOND 

At that point, the world must respond — fast. For Ebola, it did not. The 
initial outbreaks occurred in December 2013, but Ebola was only identi- 
fied as the cause at the end of March 2014, by which point the outbreak 
had already spread. Early alarms by the humanitarian organization 
Médicins Sans Frontiéres (MSF; also known as Doctors Without Bor- 
ders) were ignored, and the international response did not kick into high 
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24 | NATURE | VOL 524 | 6 AUGUST 2015 


t 


3. Small outbreaks 

Pathogens that spill over and 
then spread between just a few 
have caused human disease in spread further. people. 
the past. 

Examples: Examples: Examples: 


gear until September (see Nature 513, 469; 2014). “Ebola spun out of con- 
trol because ofa lack of political leadership, will and accountability — not 
because of insufficient funding, early-warning systems, coordination or 
medical technologies,’ Joanne Liu, international president of MSE, told 
a gathering of health leaders in May. 

This was not how it was meant to be. In 2005, all 196 countries adopted 
a set of laws called the International Health Regulations, which were 
designed to improve the response to disease outbreaks. The regula- 
tions — effectively the world’s emergency action plan — were spurred 
by the SARS epidemic, and by outbreaks of H5N1 avian flu virus. 

But Ebola revealed how weak the regulations are. They mostly tasked 
individual countries with dealing with outbreaks — setting targets for 
them to reinforce their capacities for disease surveillance and response 
by 2012 — but did not include support to help the poorest countries 
reach those goals. This weakness has long been recognized, but not 
acted on — an “elephant in the emergency room’, says David Fidler, a 
specialist in international and national security law at Indiana University 
Bloomington. Ten years after the treaty was adopted, two-thirds of its 
signatories have yet to meet the targets. 

The regulations also failed to create an international rapid-response 
group to deal with a major outbreak. The WHO has never had outbreak- 
response teams on the scale needed to deal with an epidemic as large as 
Ebola, says Fidler, and what capacity it had has been slashed by budget 
and staff cuts. “What we are seeing in the Ebola crisis is the lack of a 
global public-health expeditionary capability that can handle something 
on acountry or regional scale,” he says. 

Governments and international organizations are now considering 
a raft of proposals to prevent the next serious outbreak from growing 
into an epidemic. These include boosting financial support for surveil- 
lance and outbreak response in low- and middle-income countries, and 
reform of the WHO, which has come under fire for its slow response to 
Ebola. One idea is to create a Centre for Emergency Preparedness and 
Response within the WHO but autonomous from it to avoid the agency's 
notorious politicization and bureaucracy. The body would link to other 
United Nations’ agencies, the World Bank, philanthropic organizations, 
non-governmental organizations and industry. It would create an inter- 
national reserve force that could be rapidly deployed to an outbreak, and 
be able to call up the planes and helicopters often needed to quickly ship 
large amounts of medical equipment to regions in need. 

The World Bank, the WHO and other organizations are also working 
on the idea of a Pandemic Emergency Facility that could swiftly send 
contingency funds to cover the efforts of the WHO, governments and 
other bodies in the event of a serious outbreak. 

The question now is whether these grand plans will become a reality. 
Many people hoped that these and other measures to reinforce outbreak 


The size and severity of disease outbreaks depends on where the causal agent sits in an 
evolutionary spectrum, ranging from animal viruses that have yet to leap to humans, to 
pathogens that have evolved to spread easily between humans. 
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Public-health experts fear a repeat of the 1918 flu pandemic which killed as many as 50 million people. 


preparedness and response would receive firm pledges at the June sum- 
mit of G7 industrialized countries in Germany. But although the sum- 
mit produced supportive language, it did not make concrete decisions, 
something that disappoints Manica Balasegaram, executive director of 
MSF’s Access Campaign in Geneva, Switzerland. “We need money put 
on the table, we need political commitment and funding,’ he says. 

But Farrar says that the high-level political attention is a good sign. 
He notes that the G7 has previously delivered on major public-health 
initiatives, such as helping to create the multibillion-dollar Global Fund 
to Fight AIDS, Tuberculosis and Malaria in 2002, two years after it was 
first proposed. What emerged from the G7 this year “has to be seen as 
setting a tone and a direction,’ Farrar says. “What’s key is what then 
comes out of the language.” 


HOW TO GET VACCINES AND DRUGS 
Even if the world reacts quickly to an emerging outbreak, it has to have 
effective tools to deploy. A vaccine could have stopped Ebola in its tracks, 
but the only ones available had not been tested in humans. Drugs, too, 
were stuck in the experimental phase. In this and other outbreaks, health- 
care workers often have to rely on centuries-old public-health measures, 
such as quarantine, chemical disinfection and encouraging hand wash- 
ing — essential, but often not enough. 

If a worst-case epidemic hit tomorrow, the script would probably be 
the same. The problem, say public-health officials, lies in how global drug 
and vaccine development is set up. The process is left largely to major 
pharmaceutical companies, which are geared towards treating those who 
can pay — developed-world inhabitants with mostly developed-world 
diseases — rather than to addressing the most pressing global health 
needs, which are often infectious diseases in the developing world. “What 
humanity actually needs isn't part of the equation,’ says Morens. “It’s what 
can make big bucks.” 

That there were even candidate vaccines and drugs for Ebola was 
largely down to spending on biodefence rather than concerns about 
global health, says Balasegaram. And there are few, if any, effective drugs 
and vaccines for a host of other epidemic threats and neglected diseases 
ranging from SARS to dengue — leaving the world defenceless against 
almost all the pathogens most likely to cause the next epidemic. 

After Ebola, “there is a real opportunity to change the status quo’, says 
Jean-Francois Alesandrini, a spokesman for the Drugs for Neglected 
Diseases Initiative (DNDi), a non-profit body working on long-ignored 
diseases such as leishmaniasis. 

Ina paper published in May, leading researchers 
and public-health officials proposed the creation 
of an international not-for-profit pharmaceutical 
body, bringing together research organizations, 
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governments, charities and private pharmaceu- 
tical companies that would research, develop 
and manufacture medical countermeasures 
for the many global-health threats for which 
there is little or no market (M. Balasegaram et 
al. PLoS Med. 12, e1001831; 2015). 

Such efforts have precedent in public-private 
partnerships (PPPs) that have sprung up over 
the past 15 years, including the DNDi. The pro- 
posed initiative would be similar, but writ large: 
with proposed funding of $10 billion annually, 
it would focus not only on emerging epidemic 
threats, but also on existing neglected diseases 
and developing much-needed new antibiot- 
ics. This would share limited resources, ensure 
sustained financing, and allow more coherent 
long-term planning. “There is no PPP for out- 
break pathogens. It is time to create one,” says 
Hill. “Ifthis doesn’t happen soon, the opportu- 
nity will be lost as global attention moves on.” 

Pharmaceutical companies are generally 
supportive of the proposal — which is crucial, because such ventures 
typically need access to the vast drug libraries, vaccine-technology plat- 
forms and manufacturing capacity that only industry possesses. 

Within such a scheme, Hill favours the immediate and accelerated 
development of vaccines against priority threats such as MERS and Mar- 
burg — a virus from the same family as Ebola that kills most of those it 
infects. He suggests foregoing the slow, costly animal studies that require 
high biosafety and biosecurity labs to contain the viruses, and instead 
developing small batches of vaccine that could be put directly through 
phase I safety and dosage testing in humans. If the vaccines were safe, and 
generated a good immune response, it is likely they would work, he says. 
Stockpiles could then be created, ready for phase II efficacy trials to start 
as soon as an outbreak occurs — so that it “can be nipped in the bud’, Hill 
says. Researchers are encouraged by the announcement this week that 
a clinical trial of an Ebola vaccine has had positive results (see page 13). 

But that still leaves the unknown pathogens, which are harder to pre- 
pare for. One option in such an outbreak would be to transfuse patients 
with the plasma of survivors, whose blood is often rich in antibodies 
specific to the virus, says Ian Lipkin, a virologist and outbreak specialist 
at Columbia University in New York. In many cases, this technique could 
provide a quick, ready-made therapy to an unknown pathogen, bypass- 
ing the years of research it can take to find drugs or vaccines. 

The approach gained prominence during the Ebola outbreak: clinical 
trials of ‘convalescent plasma for Ebola began in West Africa in Decem- 
ber (see Nature http://doi.org/6dr; 2014), and results are expected in 
coming months (Nature 517, 9-10; 2015). Lipkin would like to see the 
infrastructure for collecting and processing blood and plasma improved 
in poorer countries, where it is often lacking. 

Ideally, say researchers, clinical-trial designs would also be approved 
by regulators before an outbreak so that a trial could launch straight away 
(see page 29). This is already being done by researchers in the Interna- 
tional Severe Acute Respiratory and Emerging Infection Consortium, 
an international network of outbreak specialists based in Oxford that 
aims to develop generic clinical-trial protocols that can be adapted to 
any epidemic threat. 

Reforming the world’s epidemic response systems is not going to be 
easy, and public-health specialists are well aware that impetus might be 
lost as the Ebola epidemic fades from the limelight. But they also think 
that the shocking events in West Africa — bodies on the streets, nation- 
wide quarantines, economies collapsing — have left an indelible mark. 

The West African epidemic has been a “game-changer” in how the 
world prepares for a serious epidemic, says Morens. The era after Ebola, 
he hopes, will be very different from the one before it. m 


Declan Butler is a senior reporter for Nature in France. 
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People in Sierra Leone queue for food in March, during a country-wide lockdown intended to curb Ebola. 


Finish the fight against Ebola 


Leaders and health agencies are talking about ‘lessons learned’ from West Africa’s 
Ebola epidemic. But a major push is needed to end the outbreak, urges Joanne Liu. 


umbers of cases refusing to 
Nii new communities being 

infected, bodies buried in secret. 
Sound familiar? It should. But these are not 
just scenes from last year’s Ebola epidemic. 
They are playing out today in West Africa. 

As head of Médecins Sans Frontiéres 
(MSF; also known as Doctors Without 
Borders), which has treated one-third of 
reported Ebola cases in the outbreak, I have 
witnessed how a lack of political will under- 
mined the response in the early days of the 
epidemic. Now, fatigue and a waning focus 
are threatening the final push to end it. 

In the past few months, a stream of inter- 
governmental panels has been convened to 
appraise the international response to the 
epidemic — by the World Health Organi- 
zation (WHO), the World Bank and the 


G7 industrialized nations, among others. 
Meanwhile, an ever-growing list of philan- 
thropic and academic institutions is prepar- 
ing reports about ‘lessons learned; to prevent 
future outbreaks. These include Harvard 
University in Cambridge, Massachusetts, 
and the Institute of Medicine, a US non- 
governmental organization (NGO). 

Yet the Ebola epidemic in West Africa is far 
from under control. In the past three months 
alone, the number of cases — around 330 — is 
more than the third largest Ebola outbreak in 
history. Liberia, which was declared “Ebola- 
free’ in May, reported six cases by the end of 
June. And 20-27 cases 
have been confirmed 
across Guinea and 
Sierra Leone each 
week from mid-June 
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to mid-July. A large proportion of these cases 
cannot be traced back to the lists of people 
known to have been in contact with infected 
people. Over the past two months, cases 
have also emerged in Guinea's Boke prov- 
ince, which borders Guinea-Bissau, a coun- 
try with a weak health system and almost 
non-existent epidemiological surveillance 
and laboratory blood-testing capacity. 
Equally concerning is that governments 
and aid agencies are still failing to earn the 
trust of some communities in their efforts to 
combat the epidemic, even though numerous 
experiences in the past 18 months have dem- 
onstrated just how crucial this is. On 29 May, 
for instance, the Red Cross was forced to with- 
draw workers from the north Guinea town 
of Kamsar in Boke after two Red Cross cars 
and an employee’s home were attacked, 
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> anda warehouse containing equipment to 
enable safe burials was burnt down. 

Albert Einstein defined insanity as doing 
the same thing over and over again, and 
expecting different results. As a global health 
community, we cannot talk about fighting 
future epidemics more effectively when we 
have failed to incorporate the lessons learned 
in this outbreak to bring it under control. 

Getting to ‘no new cases in at least 42 days’ 
— twice the longest known incubation 
period for Ebola and the way in which the 
WHO defines the end of the epidemic — will 
require a major push. Ministries of health 
and aid agencies must do more to engage and 
empower communities in efforts to combat 
the disease, and to re-establish people's trust 
in government officials and health work- 
ers. The surveillance systems to locate and 
track new Ebola cases across Sierra Leone, 
Guinea and Liberia need to be properly 
supported — including in the districts that 
have not had an Ebola case for months. Gov- 
ernments, donors and NGOs must rebuild 
basic health-care infrastructure so that the 
countries affected can better deal with Ebola 
and the many other illnesses and conditions 
common to life in West Africa. 


IN HINDSIGHT 

In April 2014, near the start of the outbreak, 
MSF teams faced hostility in Guinea. There 
were cases of people in forest regions throw- 
ing rocks at ambulances. Such clashes have 
continued sporadically, involving our teams 
and those of other organizations. 

These attacks were not a surprise. Such 
events had been seen in previous haemor- 
rhagic-fever outbreaks and during the 2010 
cholera outbreak in 


Haiti. In the Ebola “There may 
epidemic, strangers be around 
showed up invillages 4,000 extra 
in what looked like maternal 
space suitsandtook deathseach 
away loved ones, year in the 
with onlyaroundhalf — }-¢ gion.” 


being seen again. At 
the peak of the epidemic, people were often 
not told when their relatives had died or 
were not given the chance to bury their dead 
according to custom. 

From these incidents, from our community 
teams, and from the success of other inter- 
national aid organizations, such as the Red 
Cross, we have learned how important it is to 
have the support of community leaders, both 
elected and traditional. Community buy-in to 
access affected or at-risk villages and towns, 
and talking through the importance of safe 
burials with families, is crucial. Since autumn 
2014, MSF teams have often carried out only 
one or two burials each day to allow enough 
time to explain to relatives of people who 
died from Ebola the importance of wearing 
protective clothing and disinfecting the body. 
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Six measures to end Ebola 


Isolate and care for patients. Quarantine 
infected people and provide medical and 
psychosocial support for patients and their 
families. 


Make burials safe. Disinfect corpses and 
ensure that people burying them wear 
protective clothing. 


Engage communities. Work with 
communities to help them understand the 
nature of Ebola, how to protect themselves 
and how to stem transmission of the 
disease. 


In the village of Télimédlé in Guinea, from 
May to July 2014, MSF ran an Ebola treatment 
centre near where the village elders stayed 
during the day. This allowed villagers and 
relatives to see their loved ones and how they 
were being cared for. Other initiatives, such 
as building windows in treatment centres to 
allow people to interact with infected family 
members, were received positively. 

The Liberian government similarly 
recognized the importance of engaging com- 
munities in a dialogue about Ebola. When the 
country was declared Ebola-free in early May, 
President Ellen Johnson Sirleaf acknowledged 
that the forced quarantine of West Point, an 
impoverished neighbourhood in the capital, 
Monrovia, may have done more damage than 
good by fuelling mistrust (in August 2014, 
tens of thousands of people in West Point 
were cordoned off for ten days without access 
to basic health care). 

Yet patients and at-risk communities 
continue to be treated at best as victims and 
at worst as biohazards and disease vectors, 
rather than as central to bringing the epi- 
demic under control. The government of 
Sierra Leone is still using coercive measures 
as the main tool to fight the outbreak. As 
soon as a case is confirmed, people who have 
been in contact with the patient are put in 
quarantine in their homes or in special facili- 
ties for 21 days. In Port Loko, quarantine has 
been imposed more forcefully, with security 
guards being stationed in front of the homes 
of people thought to be infected to prevent 
them or their families from leaving. 

In Guinea, quarantines — known locally 
as cerclage (encirclement) — started in July 
in some villages in the Forecariah and Boke 
provinces. From late June to late July, 25% of 
the cases in Sierra Leone and Guinea were 
identified after those infected had died in 
their communities, suggesting that people are 
still not recognizing or reporting the disease, 
or seeking care in specialized centres. 
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Support disease surveillance. Implement 
mechanisms to locate new cases 

easily, track probable transmission 
pathways, and identify sites that require 
disinfection. 


Trace contacts. Monitor those who have 
had contact with infected people to ensure 
quick referral for care if they fall ill. 


Re-establish health-care systems. 
Make medical care available for people 
with illnesses and conditions other 
than Ebola. JL. 


In the ongoing effort to combat Ebola, 
more needs to be done to rewrite the public- 
health narrative. It must move from one that 
has been infused with fear to one that recog- 
nizes the hope for survival that supportive 
care can offer infected people. 


RACE TO REBUILD 

A major threat to bringing the current 
epidemic under control is the devastation to 
the public-health systems in all three coun- 
tries (see ‘Six measures to end Ebola’). The 
nations, with help from the international 
community, must re-establish basic health- 
care and public-health measures such as 
infection control, precautions for handling 
blood and other bodily fluids, as well as 
triage. (Triage involves setting up systems 
through which patients can be quickly tested 
for various diseases and assigned to isolation 
areas until a diagnosis has been made.) 

The epidemic has left already-fragile health 
systems in tatters: at least 509 local health 
workers have died in the three affected coun- 
tries, and routine vaccination programmes 
for diseases such as measles, rubella, teta- 
nus and polio have all but ground to a halt 
in Sierra Leone and Guinea. Liberia restarted 
vaccinations efforts only in May. 

From January to June, there were 850 mea- 
sles cases in Liberia alone, and more than 
500 children contracted whooping cough 
over a similar period. The reluctance of par- 
ents to bring their children to health facilities 
to receive the regular schedule of vaccina- 
tions has undermined immunization rates. 

Malaria probably killed 10,900 more 
people in the three countries in 2014 than 
it would have done had the outbreak not 
happened, owing to the drop off in care’. 
Efforts to reduce maternal mortality rates 
in the countries — which already have some 
of the highest in the world — have faced a 
tremendous setback. An evaluation by the 
World Bank estimates that the loss of health 
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workers to Ebola will increase maternal 
mortality by 38% in Guinea, 74% in Sierra 
Leone and 111% in Liberia. In other 
words, there may be around 4,000 extra 
deaths each year in the region’ if current 
levels of care continue. 

Reinstating non-Ebola related health 
care, and re-establishing people's trust in 
it, is crucial to ending the epidemic — as 
well as to preventing a fresh wave of hard- 
ship for the people of West Africa. Without 
proper triage, people infected with Ebola 
may infect those who have a more treat- 
able condition, such as malaria. Moreover, 
people with various diseases and condi- 
tions may avoid seeking care for fear of 
being infected with Ebola. 

The departure in early July of French 
military medical teams from Guinea is 
worrying, as is the proposed withdrawal 
of Portuguese government teams support- 
ing laboratory capacity in Guinea-Bissau. 
International efforts must be redoubled; 
United Nations agencies, foreign aid 
teams and NGOs should not yet pull out 
of West Africa. 

Financial mechanisms being designed 
to combat future outbreaks, such as the 
World Bank’s Pandemic Emergency 
Facility, should be deployed immediately 
to allow affected countries and neigh- 
bouring ones to bolster their epidemic 
response and preparedness. Neighbour- 
ing nations should be incentivized to 
do active surveillance without worrying 
about the potential economic impact 
of declaring cases. And if multilateral 
financial instruments are not ready, then 
developed economies such as the United 
States, Canada, the European Union and 
Japan, should fill the gaps. And prelimi- 
nary results from a vaccine trial in Guinea, 
released on 31 July, are very promising’. 

This year, I have been invited to partici- 
pate in several expert panels focused on 
learning from this epidemic, to prevent 
history from repeating itself. Today, all 
the ingredients that enabled last year’s 
devastation are still with us: rainy sea- 
sons, an uncoordinated response, fear 
and distrust. We need to push through 
the fatigue and complacency, and put 
everything we have learned into action 
to end this epidemic. We must finish the 
fight against Ebola. m SEE NEWS FEATURE P.22 


Joanne Liu is international president 
of Médecins Sans Frontiéres in Geneva, 
Switzerland. 
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Physicians in Liberia attend to a person ill with suspected Ebola. 


Embed research 
in outbreak 


response 


Testing Ebola treatments in West Africa’s epidemic 
happened too late. Research response during future 
outbreaks must be more nimble, says Trudie Lang. 


ne year ago this month, the World 
() Health Organization (WHO) 

declared the Ebola outbreak in West 
Africa a public-health emergency. A tremen- 
dous national and international response 
followed as it became clear that the epidemic 
would be on a scale never seen before with 
this disease. 

As part of that response, in September 
2014, my colleagues and I were funded to 
establish protocols for clinical trials to evalu- 
ate possible treat- 
ments. We are part ODNATURE.COM 
of the Epidemic Dis- _ For Nature's special 
ease Research Group _ on Ebola, see: 
Oxford and were _ tiatlre.com/ebola 
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funded by the Wellcome Trust, a biomedical 
charity in London. 

Within an unprecedented 12 weeks, 
we had protocols approved by ethics 
committees, drugs to hand, and staff trained 
and ready to begin trials at a treatment centre 
in Liberia. It was thanks to an extraordinary 
collaborative effort involving the Univer- 
sity of Oxford, UK, the WHO, Médecins 
Sans Frontiéres (MSF; also known as Doc- 
tors Without Borders), researchers in West 
Africa and many others. 

But another six weeks then passed before 
we could start giving the drug to patients, 
mostly because of bureaucratic and logistical 
barriers (see ‘Timeline toa clinical trial’). > 
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> — There is still no proven treatment for 
Ebolaa year after the WHO announcement 
— in part because of the kinds of stum- 
bling blocks we encountered. As the rate 
of new infections slowed, it became clear 
that there would not be enough patients to 
test our drug on, and we stopped the trial 
in February. 

Government leaders must give the WHO 
the money and the support it needs to 
ensure that the world is ‘research ready’ for 
the next outbreak. A properly funded and 
empowered WHO could oversee the design 
and implementation of an on-call global 
task force of clinical-trial staff. It could also 
establish working groups to draw up tem- 
plate contracts so that agreements between 
the various players in a clinical trial can be 
signed off at speed. Most importantly, it 
could orchestrate outbreak research. 


CLINICAL COMPLEXITY 

It usually takes at least 18 months to set up 
a clinical trial. Myriad scientific, technical, 
regulatory and legal challenges must be 
resolved before any data can be collected’. 
This is why there are no proven treatments 
for diseases such as Ebola — there had never 
been any clinical trials conducted during a 
disease outbreak”’. Yet by mid-November — 
just three months after the WHO declared 
the outbreak a public-health emergency 
— we were ready to start testing the effects 
of brincidofovir on patients in Liberia in a 
treatment centre run by MSF. 

We achieved this in part by devising a clean 
and pragmatic protocol — it demanded no 
extra blood or other sampling over what 
would be carried out anyway as part of a 
patient’s standard care (see “Three successes’). 

Phase II and III trials, conducted since 
2012 in the United States and involving 
about 1,000 people, had already shown that 


TIMELINE TO A CLINICAL TRIAL 


During the Ebola epidemic, some of the steps in going 


brincidofovir could clear certain viral infec- 
tions from children and adults without caus- 
ing worrying side effects*®. And we knew 
that it would anyway be difficult to distin- 
guish side effects of the drug from symp- 
toms of an Ebola infection. So we decided 

to monitor only seri- 


“An ous and unexpected 
international, side effects, and oth- 
neutral body erwise only verify 
needs to be whether patients were 
put in charge alive seven days after 
of outbreak receiving the treat- 


ment (deaths from 
Ebola usually occur 
in the first few days after being admitted toa 
treatment centre.) 

Once it had been confirmed that a person 
was infected, staff first needed to take the 
patient through the process of informed con- 
sent. They then gave enrolled patients the 
tablets and made observations; if the patient 
vomited after taking the drug, for instance, 
staff would need to give them a second dose. 

Anything more complex, such as attempt- 
ing to assess all the drug’s possible side 
effects, would have meant monitoring 
several variables, from blood pressure to 
pain, in hundreds of patients. This would 
have increased both the risk of infection for 
the staff taking the measurements and the 
chance of introducing errors into the data. 

Another factor that speeded things up 
was the astounding way that everyone on 
the ground came together. The research eth- 
ics committees that we worked with in both 
Liberia and Sierra Leone must have been 
flooded with requests from the research 
groups wanting to conduct trials. Yet we 
received detailed and high-quality reviews 
of our proposals within days of submitting 
them — a process that would normally have 
taken at least three months. 


research.” 


rom receiving grant money to testing 


a candidate drug on a patient were achieved in record time. Other steps, such as getting 
agreement on contracts, must be completed much more quickly in the next epidemic. 


Grant awarded 

Data-management system in place 
Protocol for clinical-trial designed 
Drug selected 


Contracts drawn up and signed 


Approval by Oxford ethics committee 


Approval by Liberian ethics committee 
Import licence obtained for drug 


Drug exported from United States 


Final agreements between MSF, 
drug company and Oxford 
First patient given drug 


Week 1 23 4123 42123 4 


Sep 
2014 
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Rate of new 
infections 
starts to slow. 


1 2.34 1 


Oct Nov Dec Jan 
2015 
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However, we encountered major 
stumbling blocks when dealing with the 
following four issues. 


Difficulties in deploying African staff. 
We knew that hundreds of people in Africa, 
including nurses, clinicians and pharma- 
cists, had the skills and experience to set 
up and conduct a robust clinical trial. So in 
October, we put out a call for clinical-trial 
staff on the Global Health Network (www. 
theglobalhealthnetwork.org) — an online 
forum for medical researchers in low- and 
middle-income countries. Within 24 hours, 
we had received more than 250 replies from 
experienced African staff. 

Just a few days later, we realized that we 
would not be able to secure visas for the 
responders fast enough to ensure them ade- 
quate care should they become infected. In 
the end, we employed staff from the United 
Kingdom, Australia, France, Ireland and 
elsewhere — people who could be repatri- 
ated quickly if necessary. Although the visa 
problems did not stall progress, it would 
have been more appropriate and better for 
strengthening Africa’s research capacity and 
international ties if we had been able to use 
the skilled workers from African countries. 


Delays over contracts. In mid-November, 
we were again hampered by bureaucracy. A 
major difficulty was getting the legal contracts 
drawn up and agreed to by the various parties 
involved — the University of Oxford, the drug 
company Chimerix of Durham, North Caro- 
lina (which was supplying the brincidofovir), 
and MSE. MSF has shown tremendous leader- 
ship in the response to the Ebola epidemic but 
— appropriately — the organization is geared 
to delivering aid, not to facilitating research. 
Just as the epidemic began to show signs of 
slowing, we were delayed by six crucial weeks 
while waiting for contracts to be processed 
through MSF’s systems, which took longer 
than seemed necessary. 


An unorchestrated ‘land grab’. By the end 
of 2014, five research groups, including ours, 
were ready to start clinical trials for candi- 
date treatments. This meant that humanitar- 
ian agencies such as MSF, Save the Children 
and GOAL, as well as local health-care lead- 
ers, had to make difficult choices about what 
research to do where. 

For the trial that we conducted in Liberia, 
staff worked in pairs on 45-minute rota- 
tions to avoid overheating in the full-body 
suits that they had to wear in the treatment 
centre's ‘red zone. This meant that two trial 
staff could attend to only about five patients 
at one time. An obvious solution would have 
been to run the trial across multiple treatment 
centres simultaneously — but getting access 
to more centres was not feasible because of 
the complexity of the procedures and the 
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In future, visa issues must not prevent African health workers from helping with clinical trials. 


time involved. Although the various teams of 
researchers worked hard to collaborate, for 
instance by standardizing methods and shar- 
ing data, on the ground it felt as if we were ina 
chaotic ‘land grab for sites and patients. 


READY FOR NEXT TIME 

Despite the lack of a proven treatment 
for Ebola, our efforts and those of other 
researchers over the past year will have been 
worth it ifthey help to ensure that, next time, 
the global community is better prepared. 
Humanitarian organizations routinely 
mobilize diverse groups of people, includ- 
ing local workers, to help to deliver aid after 
earthquakes or tsunamis; research teams 
need to be mobilized just as quickly. 

First, an on-call global task force con- 
sisting of, say, 100-200 clinical-trial staff 
spread across five different countries 
should be established. This could be funded 
by agencies such as the Wellcome Trust, or 
by philanthropic organizations, such as the 
Bill & Melinda Gates Foundation (which 
partners with medical humanitarian chari- 
ties). These people should be employed in 
everyday studies and be trained for outbreak 
research so that they can be deployed imme- 
diately to coordinate a trial in the event of 
an epidemic. Research centres that are well 
positioned and located to handle outbreaks 
could collaborate and provide the missing 
diagnostic capacity by making their labora- 
tory expertise known and available. 


Second, contractual agreements between 
parties with stakes in a clinical trial will always 
be necessary. Probable snagging points — 
such as concerns over drug pricing or data 
ownership — are easy to predict and should 
be addressed to some degree ahead of time. 
According to one contract template, the 
company providing the drug would have, 
say, exclusive access to the data for a limited 
amount of time; in another, the data would 
be made public as soon as they are generated. 

Finally, an international, neutral body 
needs to be put in charge of outbreak 
research. Before the next outbreak, such 
a body could hammer out the details of 
crisis trial staffing and contracts. Most 
importantly, this organization could set the 
research priorities during an epidemic and 
ensure that adequate numbers of sites and 
patients are allocated to the different teams 
involved. The WHO is the obvious agency 
to do this but it currently lacks the necessary 
funds, mandate and support. 

In the case of the Ebola epidemic, instead 
of having multiple research groups, each 
struggling to complete their trial because of 
insufficient numbers of patients, the WHO 
could have directed all the teams to recruit 
patients for an agreed prioritized trial. This 
would have been a better approach scien- 
tifically and ethically. If a trial is stopped 
because of insufficient numbers of partici- 
pants, then every patient who has taken part 
in it has taken a risk needlessly. 
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EBOLA CLINICAL TRIALS 
Three successes 


The trial conductors (the Epidemic 
Disease Research Group Oxford) 
showed that clinical trials do not have 
to be expensive, slow and difficult. 


The clinical staff employed did an 
incredible job. Many had never been 
to Africa before, and were plunged into 
gruelling conditions. Their willingness 
to leave their families and work long 
hours without dropping standards 
speaks to the feasibility of an on-call 
global task force for clinical trials. 


The research teams set a precedent 
for data sharing, spurred by the 
International Severe Acute Respiratory 
and Emerging Infection Consortium. 
Teams agreed on what endpoints to 
measure in trials and standardized 
the types of data collected. They also 
shared experiences in meetings led 

by the World Health Organization, 
teleconferences and on a dedicated 
website (www.ebolaclinicaltrials.org). TL. 


To obtain a solid evidence base for the 
treatment, prevention and management of 
infectious diseases, everyone involved in 
outbreak response — aid agencies, minis- 
tries of health, health-care workers on the 
ground — needs to have research embed- 
ded in their plans long before an epidemic 
takes hold. Only then can experimen- 
tal treatments be tested within days, not 
months. m SEE NEWS FEATURE P.22 


Trudie Lang is professor of global health 
research at the Centre for Tropical Medicine 
and Global Health, University of Oxford, UK. 
e-mail: trudie.lang@ndm.ox.ac.uk 
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CLARIFICATION 

The Comment ‘Agree on biodiversity 
metrics to track from space’ 

(A. K. Skidmore et al. Nature 523, 403- 
405; 2015) referred to Copernicus as a 
European Space Agency (ESA) initiative. 
In fact, it is a European programme to 
which ESA contributes. 
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In the blink of an1 


Douwe Draaisma is impressed by a study on the science behind ‘maladies of the self’. 


t might happen while you are lecturing. 

All of a sudden, you hear yourself talking: 

an autopilot version of yourself seems 
to have taken over. With rising panic, you 
struggle to get back in, praying that what this 
autopilot has to say makes sense. 

Who — or what — is the ‘yourself’ that 
does the talking? And who is the T that 
anxiously tries to regain control? Are there 
temporarily two selves? Or is there still a 
single self, experienced from the outside? In 
most cases, the ‘split’ 
dissolves quickly and 
you slip back into the 
driver's seat. You have 
experienced a brief 
spell of depersonali- 
zation. 

Depersonalization 
can also be patho- 
logical, sometimes 


linked to epilepsy, The Man Who 

and can last for min- YVasn’t There: 

utes or even hours Investigations. into 
* the Strange New 


To science writer 
Anil Ananthaswamy, 
chronic types of 


Science of the Self 
ANIL ANANTHASWAMY 
Dutton: 2015 
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dissociation belong to “maladies of the self”, a 
set of experiences, conditions and syndromes 
that offer a window on what constitutes a self. 
For The Man Who Wasn't There, Ananthas- 
wamy interviewed patients, psychiatrists 
and neuroscientists, charting how the self is 
affected in people with autism spectrum dis- 
order, dementia, epilepsy or schizophrenia, 
and examining out-of-body experiences, 
doppelganger hallucinations and phantom 
sensations. Much of the book reads like a 
travelogue, an exploration of the fringes of 
human experiences with Ananthaswamy a 
dependable guide, as in his celebrated The 
Edge of Physics (Gerald Duckworth, 2010). 
However elusive the experiences may seem, 
he keeps analysis close to the findings of 
modern neuroscience and psychiatry. 
Ananthaswamy hears intimate, sometimes 
heartbreaking stories about what it means to 
experience a condition’s symptoms. He has 
a gift for weaving these through the techni- 
calities of neuroscientific literature. Auto- 
biographies hinging on conditions such as 
Asperger's syndrome and schizophrenia are 
proliferating, but there is little to fill the void 
between such accounts and the scientific 
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literature. Linking experiences with experi- 
ments, and individuals with numbers, Anan- 
thaswamy bridges that gap convincingly. 

Possibly the most harrowing malady of the 
self is Cotard’s syndrome, in which a person, 
often with severe depression, believes that he 
or she has died. Ananthaswamy presents the 
case of 48-year-old Graham. After a failed 
attempt to electrocute himself, he became 
convinced that he was brain dead. Scans 
showed severe loss of activity in the frontal 
and parietal regions of the brain — structures 
supporting the ‘default mode network, which 
allows one to remember and maintain the 
feeling that there is an T that acts and expe- 
riences. Investigators speculated that antide- 
pressants — or depression — could dampen 
activity in these brain areas but held that nei- 
ther hypothesis could explain the extent of 
the lowered metabolism. Cotard’s syndrome 
is philosophically unsettling, because it ques- 
tions the axiomatic 


certainty of the Carte- O NATURE.COM 
sian ‘I think, therefore  Formoreonscience 
Iam.’ Yet, Ananthas- _ inculturesee: 
wamy observes, there _ siaflire.coii/ 

must still bean T that booksandarts 
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experiences the delusion of being dead. 
More common, but equally ghostly, are 
phantom experiences. After amputation of 
a limb, some people still feel itching or pain 
from it, probably due to activity in the now- 
vacant part of their cortical ‘map; the neu- 
rological representation that supports their 
body image. Lesser known is the inverse, 
body integrity identity disorder, in which a 


re person feels that a 
‘Ananthaswamy healthy body part 


hearsintimate, js foreign to them. 


sometimes . The disorder may 
heartbreaking cause severe suf- 


stories about fering. Quite a few 
what it means desperate people 
to experience have taken it into 
symptoms.” their own hands to 


get rid of the prob- 
lematic body part, and have bled to death. 

In a moving chapter, Ananthaswamy 
travels with ‘David’ to an Asian surgeon who 
relieves him of a leg that has felt odd since 
childhood. Afterwards, David finally feels at 
one with his bodily self. Swiss neuropsychol- 
ogist Peter Brugger suggests that a limb that 
feels foreign may be the result of a cortical 
map that never included it in the first place. 

There are many such inversions in The 
Man Who Wasn't There. They make intrigu- 
ing associations. Could the feeling of a split 
self in depersonalization be the inverse of 
the ecstatic feeling of oneness with the world 
sometimes experienced during an epileptic 
seizure originating in the temporal lobe? 
(The brain region that is hyperactive dur- 
ing ecstatic seizures, the anterior insula, is 
underactive during chronic depersonaliza- 
tion, which seems to point in this direction.) 
Is the loss of a self supported by personal 
memories in Alzheimer’s disease analogous 
to the scrambling of the self in schizophre- 
nia? And could the trouble that some people 
with autism spectrum disorder have in intu- 
iting the mental states of others — which has 
been called a deficient theory of mind — also 
cause the less sophisticated introspective 
skills that they may have? 

Ananthaswamy does not end with a list of 
conclusions about the location, structure or 
organization ofa hypothetical self. One could 
hardly expect him to: most of the research is 
in flux, and has been especially so since the 
introduction of sophisticated imaging tech- 
niques. Instead, he gives a sense of the many 
forces — hormonal, chemical, psychological, 
social — that modulate the self-as-experi- 
enced. One finishes the quest with a sense of 
paradox that the concept of self, often seen as 
elusive if not illusory, is so eminently suited to 
tightening these various narrative threads. m 


Douwe Draaisma is professor of the history of 
psychology at the University of Groningen in 
the Netherlands. His latest book is Forgetting. 
e-mail: d.draaisma@rug.nl 
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Books in brief 


Katrina: After the Flood 

Gary Rivlin SIMON & SCHUSTER (2015) 

Ten years ago this month, New Orleans lay drowning, its levees 
breached by the storm surge from Hurricane Katrina. One million 
people were displaced and, despite federal preparedness exercises, 
the administration responded sluggishly. Journalist Gary Rivlin 
sweeps from street to boardroom in this history of the aftermath, 
studded with figures such as polarizing New Orleans mayor Ray 
Nagin and seething with egregious political failings that deepened 
racial inequality during the city’s recovery. As Rivlin sharply reminds, 
overcoming disasters is very much an issue of governance. 


The Black Mirror: Looking at Life Through Death 

Raymond Tallis YALE UNIVERSITY PRESS (2015) 

Death may be unimaginable, but former geriatric specialist Raymond 
Tallis explores it imaginatively nonetheless. Inspired by novelist 

E. M. Forster’s line from Howards End (1910), “Death destroys a man; 
the idea of death saves him”, Tallis’s meditation on his future corpse 
is ameshed march of philosophical musings and bald physical detail. 
As he sifts a lifetime’s worth of sensory and emotional memory, 
Tallis’s prose stuns like poetry — from the “crackling, rebellious 
stretching as paper balls unscrunch” to the self’s continuity despite 
the “distracted, multiple” nature of life. Enchanting. 


Behind the Binoculars: Interviews with Acclaimed Birdwatchers 
Mark Avery and Keith Betton PELAGIC (2015) 

Whether spotting golden eagles in Idaho or long-tailed tits in 
London, professional birdwatchers are a rare breed — observational 
dynamos wedded to their craft. Wildlife campaigner Mark Avery and 
birdwatcher Keith Betton have captured 20 stories (including their 
own) from British luminaries such as wagtail expert Stephanie Tyler 
and birder extraordinaire Lee Evans. This is both a serious overview 
of the field and a flock of delights, from the shot of a youthful Betton 
with three young song thrushes balancing on his forearm to fond 
memories of first binoculars, whether Leica Ultravids or Swarovskis. 


Pure Intelligence: The Life of William Hyde Wollaston 

Melvyn C. Usselman UNIVERSITY OF CHICAGO PRESS (2015) 

He was crucial to the development of crystallography, and 
discovered the amino acid cystine and the elements palladium and 
rhodium. Yet scientific polymath William Hyde Wollaston (1766- 
1828) is largely forgotten. This meticulous biography, the life’s work 
of late chemist Melvyn Usselman, reveals a man of indefatigable 
curiosity and methodological genius. As we see Wollaston crafting 
analytical instruments for Arctic expeditions, stargazing or showing 
scientific writer Mary Somerville the uses of a goniometer, we can 
only concur with Usselman that this was a “man worth knowing”. 


Sixty Degrees North: Around the World in Search of Home 
Malachy Tallack POLYGON (2015) 

If you follow Shetland’s latitude of 60° N around the world, you 

will encounter cultures “challenged by climate, by landscape, by 
remoteness”. So writes Shetlander Malachy Tallack in this powerful 
memoir detailing how, unmoored by his father’s death, he traversed 
that line to explore geographies inner and outer. Whether in Alaska’s 
Kenai peninsula dodging bears or among the Even people in “huge, 
cold and utterly strange” Siberia, Tallack is forever testing the 
psychological dynamics of the sense of belonging. Barlara Kiser 
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Inside the fear factor 


Susanne Ahmari applauds neuroscientist Joseph LeDoux’s redefinition of anxiety. 


ome 40 million people worldwide 

have been diagnosed with anxiety 

disorders. In Anxious, Joseph LeDoux 
presents a rigorous, in-depth guide to the 
history, philosophy and scientific explo- 
ration of this widespread emotional state. 
An eminent neuroscientist and author of 
The Emotional Brain (Simon & Schuster, 
1996) and The Synaptic Self (Viking, 2002), 
he offers a magisterial review of the role of 
mind and brain in the generation of both 
unconscious defensive responses and con- 
sciously expressed anxiety. 

LeDoux looks first at how our under- 
standing of anxiety has evolved. He starts 
with ancient etymology (the Greek angh 
signified constriction) and moves on to 
Sigmund Freud's view of anxiety as the “root 
of most if not all mental maladies’, and phi- 
losopher Soren Kierkegaard’s perspective 
on it as existential, evolving from the dread 
that stems from freedom of choice. He then 
lays out the core distinction between fear 
and anxiety. Fear he defines as anticipation 
of danger from a physically present threat 
(a grizzly bear in front of you); anxiety, as 
anticipation of an uncertain threat (potential 
predators roaming outside your tent). 

But although ‘fear’ and ‘anxiety’ are 
excellent descriptors of conscious feelings, 
LeDoux shows, they should not be used to 
describe the unconscious mental processes 
and neural circuits associated with these 
emotions. Instead of thinking of those pro- 
cesses as “fear stimuli activate a fear system to 
produce fear responses’, he proposes concep- 
tualizing them as “threat stimuli elicit defense 
responses via activation of a defensive sys- 
ten’. This is a subtle distinction, and LeDoux 
makes an excellent case that it is an important 
foundation for rigorous research into the 
neural underpinnings of the conscious and 
unconscious processes that subserve anxiety. 

He ranges broadly and deeply across 
molecular neuroscience, psychology and 
philosophy, yet his methodical approach 
keeps his argument clear. He starts with a 
cogent description of threat processing and 
conditioning based on the ‘fight vs flight 
vs freeze’ framework, and moves on to the 
neural circuits thought to underlie these 
responses. He gives a clear precis of the pos- 
sible mental processes behind anxiety disor- 
ders (such as impaired ability to discriminate 
between threat and safety). He then launches 
into his central thesis: that emotional states 
of mind such as consciously expressed anxi- 
ety are not inherited from our evolutionary 
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The brain’s amygdalae (yellow) are key to anxiety. 


ancestors. He argues that decades of animal 
research have failed to show that animals 
can consciously express feeling-states — 
in part because of their limited prefrontal 
cortical development and lack of verbal lan- 
guage. He does not rule out the possibility 
that animals can consciously feel emotions, 
but states that “it is not sufficient to provide 
evidence ... that the behavior in question is 
consistent with the existence of a conscious 
experience. One also has to show that the 
behavior cannot be accounted for by pro- 
cesses that work nonconsciously.” A dog may 
look ecstatic when given a meaty bone, but it 
is difficult to prove scientifically that it feels 
what we think of as ecstasy. 

The clinical importance of this distinction 
becomes clear in subsequent chapters, in 
which the unconscious and conscious brain 
processes involved in expressing fear and 
anxiety are ascribed to distinct neural circuits 
that may have different roles in the pathol- 
ogy and treatment of anxiety disorders. For 
example, when a hiker 
encounters a snake, 
information is rapidly 
and unconsciously sent 
from the eyes through 
the sensory thala- 
mus to the amygdala, 
which can trigger the 
hiker to freeze before 
she is aware of the 
problem. In a slower 
process, the visual cor- 
tex receives the same 
information from the 


Anxious: Using 
the Brain to 
Understand and 
Treat Fear and 


Anxiety 
JOSEPH LEDOUX thalamus, leading to 
Viking: 2015. conscious awareness 
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and identification of the snake. In people 
with anxiety disorders such as post-traumatic 
stress disorder, the unconscious rapid path to 
the amygdala may be too strong, leading to 
perception of threats when none exist. 

LeDoux’s views jibe with those of many 
researchers investigating the underpinnings 
of anxiety disorders, but they are not always 
acknowledged in the media and elsewhere. 
Importantly, Anxiety highlights that, although 
it may not be possible to study the verbally 
based human emotional experience using 
animal models, wise use of those models is 
crucial for progress towards treatments. They 
remain our only direct window into the mol- 
ecules, cells and circuits that guide emotions. 

Neuroscientists, psychologists, philoso- 
phers and psychiatrists will find this exqui- 
sitely referenced book particularly useful. It 
is also a must-read for young investigators, 
and anyone perusing the footnotes will be 
rewarded with an insider’s view of the state 
and evolution of anxiety research. LeDoux’s 
charming personal asides give an impression 
of having a conversation with a world expert. 

LeDoux ends on a high note, describing 
how cutting-edge research on the neural 
substrates of anxiety is being translated into 
new approaches for psychiatric treatment. He 
discusses, for example, the use of drugs that 
modulate glutamate-based synaptic transmis- 
sion to aid exposure therapy for conditions 
including phobias — a clinical improvement 
discovered through studies of threat-learning 
in rats. He also proposes adaptations to ther- 
apy protocols that could improve the efficacy 
of existing treatment. He suggests adjusting 
the timing of exposure-therapy sessions to 
maximize consolidation of new learning; ses- 
sions scheduled at night, for example, would 
allow that to happen during sleep without 
interference from the events of the day. 

Such ideas are unproven, and potentially 
difficult to translate into practice given cur- 
rent constraints on mental-health clinics and 
care providers. But they are a good example 
of the transformative potential of cross-talk 
between basic neuroscience researchers and 
clinicians. It is only through such synergistic 
collaborations that we will make significant 
advances in the treatment of anxiety. m 


Susanne Ahmari is director of the 
Translational OCD Laboratory at the 
University of Pittsburgh in Pennsylvania. 
She integrates cutting-edge neuroscience 
with clinical studies to develop treatments. 
e-mail: ahmarise@upmc.edu 
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Social impacts of 
science metrics 


Metrics used to gauge a 
researcher's productivity and 
importance to science can come 
at a social cost (J. Wilsdon Nature 
523, 129; 2015). Too often, such 
metrics are underpinned by 
values of questionable worth. 

Any quantitative measure of 
productivity will reward people 
who choose to work long hours, 
build large research teams and 
minimize their commitments 
to teaching, review panels and 
university committees. 

The use of such metrics 
can discourage people from 
sharing responsibilities and 
time with their partners or 
spouses, from investing in 
and enjoying their children’s 
lives, and from participating 
in their local communities. 
Researchers can feel forced 
to sacrifice ‘unproductive 
recreational pursuits such as 
holidays, sport, music, art and 
reading — activities that other 
metrics correlate highly with 
creativity and quality of life (see 
also J. Overbaugh Nature 477, 
27-28; 2011). 

We need a more nuanced. 
approach to academic 
evaluations for hiring, 
promotion and tenure. The 
emphasis on quantitative 
measures of productivity places 
unfair burdens on scientists and 
their families, and it discourages 
some students from pursuing 
academic careers. 

Stephen C. Harvey University of 
Pennsylvania, Philadelphia, USA. 
steharv@mail.med.upenn.edu 


Protect the young 
from e-cigarettes 


Democratic state senator Mark 
Leno is to be commended for 
trying to sustain California's 
leading position in protecting 
young people from the harmful 
effects of tobacco and nicotine, 
in all its forms (see Nature 523, 
267; 2015). Contrary to industry 
claims, the use of electronic 


cigarettes is increasing among 
young people who have never 
smoked before (R. E. Bunnell 

et al. Nicotine Tob. Res. 17, 
228-35; 2015), and not just 
among adult smokers searching 
for a less harmful alternative to 
cigarettes. 

Some e-cigarettes are 
designed to look like cigarettes, 
come in flavours that appeal 
to children and adolescents, 
and are promoted and sold in 
shops and pharmacies that are 
frequented by young people. 
Electronic cigarettes also 
deliver addictive nicotine; more 
research is needed on the safety 
of their other ingredients. 

Until more is known about 
these largely unregulated 
products, legislation similar to 
the bill that failed in California 
should be widely introduced 
to keep e-cigarettes and other 
electronic nicotine-delivery 
devices out of the hands of 
young people. 

Linda Richter CASAColumbia, 
New York, USA. 
lrichter@casacolumbia.org 


Support Nepal to 
rebuild sustainably 


A government report on Nepal’s 
earthquakes on 25 April and 

12 May, which caused around 
8,600 deaths and displaced at 
least 2.8 million people, rightly 
prioritizes the reconstruction 
of buildings and infrastructure 
(see go.nature.com/pdksq6). 
However, it overlooks 

the impact of large-scale 
restoration work on the fragile 
environment and imperilled 
ecosystems. The importance 

of this was learned from the 
extensive rebuilding in Aceh, 
Indonesia, after the 2004 Indian 
Ocean tsunami. 

A report on rebuilding in 
Aceh recommended addressing 
environmental degradation 
early in the redesign process 
to limit potential damage 
during reconstruction, 
with a view to minimizing 
deforestation and exploitation 


of natural resources (see 
http://go.nature.com/xpaxju). 
Likewise, the international 
aid community should support 
Nepal in using environmentally 
friendly reconstruction methods. 
The government must regulate 
the extraction of clay soil — in 
demand for producing trillions 
of fire bricks — because this 
can trigger landslides and erode 
fragile terrain. It should impose 
carbon-emissions standards 
on brick kilns and make them 
cleaner and more efficient, 
to cut pollution and wood 
consumption. (Deforestation 
has claimed around two-thirds 
of Nepal’s natural forest in 
30 years.) Controlling the 
excavation of gravel and sand 
from river beds would reduce 
the risk of diverting important 
currents, and would protect river 
ecosystems. 
Shiva Raj Mishra Nepal 
Development Society, Chitwan, 
Nepal. 
nedsnepal@gmail.com 


Plant collections find 
strength in numbers 


Preserved plant collections 
in the United States may be 
under threat (Nature 523, 16; 
2015), but there are grounds 
for optimism. Many herbaria, 
including those at our own 
institutions, are assembling 
digitized specimens in 
increasingly popular open 
databases. They are joining 
together to promote their value 
for research, teaching and other 
services, including the formal 
identification of species and to 
raise public awareness. 

Online information 
from plant collections is 
attracting positive attention, 
especially among younger 
scientists. Student interest 
is opening the eyes of 
university administrators. And 
crowdsourcing is educating a 
wide range of individuals as 
they collect information for 
herbarium databases. 

The Society of Herbarium 


Curators is an example of 
an international advocacy 
organization founded to 
preserve and promote 
endangered collections (www. 
herbariumcurators.org). Its 
regional networks reach out 
to groups that were previously 
under-represented in the 
botanical community, such as 
state and federal agencies, and 
schoolchildren and teachers. 
The society is developing 
community standards of 
curation and is ensuring that 
herbaria are fully used and not 
orphaned by their institutions. 
We advise every herbarium 
director to become a member: 
our strength lies in numbers. 
Conley K. McMullen James 
Madison University Herbarium, 
Harrisonburg, Virginia, USA. 
Andrea Weeks Ted R. Bradley 
Herbarium, George Mason 
University, Fairfax, Virginia, USA. 
memullck@jmu.edu 


High-rise buildings 
worsened heatwave 


This summer's heatwave in 
Pakistan was the worst in more 
than 30 years, and caused the 
deaths of more than 1,200 people 
in Karachialone. The city is 
an urban heat island that can 
reach temperatures up to 15°C 
warmer than those of its rural 
surroundings. Urgent and 
fundamental reform of local 
governance is needed to protect 
the city’s population of 22 million. 
Despite the 2010 Sindh High 
Density Development Board 
Act, high-rise development in 
Karachi has continued unabated. 
Besides obstructing life-saving 
sea breezes, these developments 
compound the city’s water and 
electricity shortages. Buildings 
are poorly ventilated and cheaply 
constructed from materials that 
are unable to cope with extreme 
temperatures. 
Abdur Rehman Cheema 
COMSATS Institute of 
Information Technology, 
Islamabad, Pakistan. 
arehmancheema@gmail.com 
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NETWORK SCIENCE 


Destruction perfected 


Pinpointing the nodes whose removal most effectively disrupts a network has become a lot easier with the development of 
an efficient algorithm. Potential applications might include cybersecurity and disease control. SEE LETTER P.65 


ISTVAN A. KOVACS 
& ALBERT-LASZLO BARABASI 


n enduring truth of network science is 
A“ the removal of a few highly con- 

nected nodes, or hubs, can break up 
a complex network into many disconnected 
components’. Sometimes, a fragmented and 
inactive network is more desirable than a 
functioning one. Consider, for example, the 
need to eliminate bacteria by disrupting their 
molecular network or by vaccinating a few 
individuals in a population to break up the 
contact network through which a pathogen 
spreads. In a quest to find the silver bullets 
that can effectively dismantle large networks, 
Morone and Makse’ (page 65 of this issue) 
have developed an algorithm that achieves this 
by identifying sets of network nodes known 
as influencers. 

It is not certain whether targeting and 
removing network hubs — defined as the 
nodes with the largest number of links — can 
inflict maximum disruption on a network. It 
may be more effective to eliminate a combi- 
nation of hubs and central, but less-well-con- 
nected, nodes. The removal of hubs is usually 
preferred because they are easy to locate, 
whereas identifying the optimal set of nodes for 
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Figure 1 | Optimal network demolition. Morone and Makse’ introduce an 
algorithm that allows them to efficiently dismantle networks. The authors 
define the collective influence of a network node as the product of its reduced 
degree (the number of its nearest connections, k, minus one), and the total 
reduced degree of all nodes at distance d from it (defined as the number of 
steps from it). a, In this network, for d=2, the red node with k=4 has the 
highest collective influence, because the total reduced degree of the nodes at 
d=2 from it (green and yellow circles) is 21. This yields a collective influence 
of 3x 21=63. The most connected hub, with k=6 (yellow circle), has a 
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which deletion would cause maximum damage 
is a non-deterministic polynomial-time hard 
(NP-hard) problem’. This means that it is com- 
putationally feasible only for small networks. 
Morone and Makse attack the problem of 
network disruption by mapping the integrity 
of a tree-like random network into optimal 
percolation*” theory. From this, they derive 
an energy function with a minimum that cor- 
responds to the set of nodes that need to be 
eliminated, to yield a network whose largest 
cluster is as small as possible. Although 
identifying this minimum is still an NP- 
hard problem, the authors were inspired by 
the energy function’s shape to find a simple 
algorithm that offers an approximate solution. 

To do this, Morone and Makse introduce 
the concept of collective influence, which is 
the product of the node's reduced degree (the 
number of its links minus one) and the sum 
of the reduced degrees of the nodes that are a 
certain number of steps away from it (Fig. 1). 
Collective influence describes how many 
other nodes can be reached froma given node, 
assuming that nodes of high collective influ- 
ence have a crucial role in the network. The 
collective-influence-based algorithm then 
sequentially removes nodes, starting with 
those that have the highest collective influence 
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(known as influencers) and recalculating the 
collective influence of the rest following each 
operation. The authors show that, for large 
networks, removing the set of influencers 
identified by this algorithm is more effective 
in fragmenting a network than removing 
the hubs, or than removing nodes that are 
identified through other algorithms, such as 
PageRank’ or closeness centrality’. The set of 
influencers identified by the authors contains 
many nodes with few connections. This high- 
lights the fact that the importance ofa node in 
ensuring a network's integrity is determined 
not only by the number of direct links it has to 
other nodes, but also by which other nodes it 
is connected to. 

The collective-influence algorithm is 
remarkable for its computational complexity 
because it requires only N’logN computa- 
tions to dismantle a network that contains N 
number of nodes. Its complexity is reduced to 
MogN if, instead of individual nodes, a fixed 
fraction of the total is removed at each step of 
the computation. The authors compare their 
method to the predictions of spin-glass theory, 
which was originally developed to describe 
the properties of disordered magnets and has 
found a range of applications in network analy- 
sis. They conclude that the nodes prioritized 


c 
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collective influence of 60. b, Removing the 6 nodes with the highest k (white 
circles) causes considerable damage to the network, but leaves a sub-network 
that contains 12 nodes unperturbed. c, By contrast, the algorithm developed 
by the authors allows them to identify a set of nodes (known as influencers) 
according to their collective influence. Using this, the removal of four 
influencer nodes (white circles) results in a fragmented network in which 
the largest connected cluster that remains has only ten nodes. This illustrates 
the algorithm's effectiveness over conventional methods for prioritizing 
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by the collective-influence algorithm represent 
an approximate solution, which has a size 
close to that of the theoretical optimal solu- 
tion. On the basis of spin-glass theory, we 
expect that the collective-influence solution 
has only a small overlap with the optimal solu- 
tion, and hence must be treated with caution. 
However, the influencers found by collective 
influence are more effective in destroying a 
network than nodes selected by other meth- 
ods. So even though the collective-influence 
method is approximate, it is faster and 
more efficient. 

As with any new algorithm, open questions 
abound. The collective-influence algorithm 
has only one free parameter — the distance, 
expressed in the number of steps, from any 
given node. At zero distance, the collective 
influence ofa node is equal to the square of its 
reduced degree, and so in this case the algo- 
rithm simply removes the hubs. To improve 
the algorithm’s accuracy, one must choose a 
non-zero distance — but one that is not too 
large, because for large distances the bounda- 
ries of the network are reached, diminishing a 
node’s collective influence (the collective influ- 
ence approaches zero). Although Morone and 
Makse find that any distance greater than one 
works, a firm criterion for choosing an optimal 
value is lacking and would be desirable. Finally, 
because the authors designed their algorithm 
to work on networks that are locally tree-like, 
further work and quantitative evidence are 
needed on its expected accuracy for networks 
with loops, such as most social networks. 

The collective-influence algorithm, just like 
similar algorithms, removes a node together 
with all its links. However, for many systems, 
node removal is too drastic an intervention. 
Softer touches, such as removing or rewir- 
ing specific links, are more tractable and 
desirable. For example, these approaches are 
relevant for networks in biological cells, in 
which many diseases are caused by mutations 
that result in deletion of links rather than the 
complete removal of nodes*. Understanding 
such ‘edgetic effects, and designing algorithms 
that can detect the minimum number of links 
to delete so as to achieve a given outcome, 
remains a challenge for future work. 

The identification of optimal influencers, 
at either the node or the link level, is the first 
step towards building networks that would 
be robust against both attacks and failures. 
Mastering the design principles of such 
super-robust networks could have profound 
implications for anything from cybersecurity 
to the design of an attack- and error-tolerant 
power grid, and may even allow us to develop 
drugs that can rescue a cellular network from 
its diseased state with minimal side effects. m 
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A smart insulin patch 


A microneedle-containing patch that is designed to sense elevated blood glucose 
levels and to respond by releasing insulin could offer people with diabetes a 
less-painful and more-reliable way to manage their condition. 


OMID VEISEH & ROBERT LANGER 


iabetes is widely recognized as one of 
D the biggest medical challenges of the 

twenty-first century, afflicting more 
than 280 million people globally’. People with 
diabetes must tirelessly self-monitor their 
blood glucose levels and inject the correct 
dose of the glucose-lowering hormone insu- 
lin to keep their blood glucose levels in the 
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Figure 1 | A microneedle patch to monitor 
glucose and release insulin. Yu et al." have 
developed a smart insulin-releasing patch made 
of 121 nanoparticle-containing microneedles. 

The patch painlessly penetrates the interstitial 
fluid between subcutaneous skin cells. The 
nanoparticles in each needle contain insulin and 
the glucose-sensing enzyme glucose oxidase, 
which converts glucose to gluconic acid. These 
molecules are surrounded by a hypoxia-responsive 
polymer. Increases in glucose oxidase activity in 
response to glucose elevation produce a low-oxygen 
environment in the nanoparticles, which is sensed 
by the hypoxia-responsive polymer, triggering 
disassembly of the nanoparticles and the release 

of insulin. 
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normal range’. This treatment regime involves 
challenges — it requires painful and inconven- 
ient subcutaneous injections, is imprecise, and 
can cause serious problems if insulin dosage 
is not closely tuned to the patient's immediate 
physiological needs*. Reporting in Proceedings 
of the National Academy of Sciences, Yu et al.* 
describe a glucose-responsive microneedle 
patch that can be painlessly applied to the skin 
and that releases insulin as blood glucose levels 
increase. 

‘Smart’ glucose-responsive insulin-based 
therapies involve the automatic release of insu- 
lin in response to increases in blood glucose 
concentration. Smart therapies can improve 
disease control and limit the potential for 
excessively low blood glucose levels, which is 
a potentially deadly effect of excessive insulin 
dosing*. To mimic the physiological needs 
of a patient accurately, such therapies must 
respond rapidly to elevated glucose levels, and 
must release insulin with kinetics that closely 
mirror those of a healthy pancreas. 

One type of smart therapy makes use of 
microcomputer-controlled insulin-delivery 
systems. These systems couple implant- 
able continuous glucose monitors (CGMs) 
to automated pumps, and administer insulin 
through a subcutaneously inserted cannula 
tube. They are currently being evaluated in 
the clinic, and have shown promise in helping 
patients to achieve their target blood glucose 
level more regularly*®. However, the sensors 
of current CGMs must be calibrated many 
times a day using hand-held glucometers. 
They produce blood-glucose measurements 
that lag behind true blood glucose levels by 
5-15 minutes, hampering efforts to maintain a 
healthy range’. They are also the size of pagers, 
and the implanted sensors and cannula 
increase the risk of infection and require fre- 
quent maintenance and replacement to combat 
the body’s immune response, increasing incon- 
venience, discomfort and cost to the patient? . 

The microneedle-patch device developed 
by Yu and colleagues is a 6-millimetre-square 
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relative of cyanase genes from nitrite-oxidizing 
bacteria. Because this single isolate is the only 
ammonia oxidizer ever found to produce 
cyanase, it is probable that along its evolution- 
ary journey it acquired the cyanase gene by 
horizontal transfer from a nitrite-oxidizing 
partner during nitrification. 

Nitrification is a cornerstone of the global 
nitrogen cycle because it both removes ammo- 
nia and produces nitrate, the latter fuelling 
many other pathways, including the eventual 
return of nitrogen gas to the atmosphere. 
Unabated nitrification, however, has undesir- 
able environmental effects, such as soil acidifi- 
cation, nitrate toxicity in drinking water, algal 
overgrowth (eutrophication) and oxygen 
depletion in coastal marine systems (‘dead 
zones’), and it contributes to global warming’. 

Although chemical fixation of nitrogen gas 
to ammonia by the Haber-Bosch process revo- 
lutionized global agriculture by circumventing 
the limitations of biological nitrogen fixation, 
we have now entered a period in which human 
sources of fixed nitrogen exceed all natural 
sources combined. This has brought the global 
nitrogen cycle to its current and serious state 
of imbalance’. The main regulatory focus for 
stemming runaway impacts of nitrification 
and nitrate has been the control of ammo- 
nium-based fertilizers — how these are applied 
and how to maximize their uptake by plants. 
Palatinszky and colleagues’ findings raise the 
question of whether cyanate should join ammo- 
nia (and urea, the only other known energy- 
providing substrate for ammonia oxidizers) as 
akey controller of the nitrification process. 

Scholars of prebiotic chemistry have 
demonstrated that cyanide can act as a build- 
ing block and substrate for generating essential 
components of living cells°. Extant micro- 
organisms’ and plants* have been identified 
that produce cyanide, take it up from the 
environment, transform it or even assimilate 
it. These findings argue for an ongoing and 
significant role of cyanide and its derivatives 
as nutrients and scaffolds for biopolymeriza- 
tion. Palatinszky and colleagues’ results elevate 
the role of cyanate (and perhaps cyanide as 
its reduced precursor) from a building block 
(and toxin) to an energy-supplying molecule. 
This places cyanate (and cyanide) in a new 
evolutionary context — one could envisage 
ecosystems here or on extrasolar planets in 
which a cyanate—cyanide cycle could support 
both assimilatory and dissimilatory modules 
of nitrogen metabolism. 

Cyanate on its own is chemically unstable 
and does not persist in large quantities in the 
environment. However, it can stably persist at 
low concentrations in seawater’ and is pro- 
duced inside cells by the decomposition of urea 
and the metabolite carbamoyl phosphate”. 
Nitrifying microorganisms, particularly in 
the oceans, are geared towards survival at 
extremely low nutrient levels. Palatinszky et al. 
suggest that nitrite-oxidizing bacteria either 
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produce more cyanate as a by-product of their 
distinctive metabolism or import cyanate on 
a continuous basis. Both of these possibilities 
argue that this involvement of cyanate sup- 
ports the formation and metabolism of nitrify- 
ing partnerships even when ammonia is scarce. 

Thus, this report is a good reminder that 
microorganisms defy our strict categoriza- 
tions into functional and phylogenetic groups 
because they evolve and survive in complex 
geochemical contexts. Nitrifying consortia 
provide an excellent illustration that microbial 
partnerships are dynamic and sometimes mys- 
terious, and often challenge our predefined 
boundaries. m 
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Uncertain future for 
vegetation cover 


How will Earth’s vegetation cover respond to climate change, and how does this 
compare with changes associated with human land use? Modelling studies reveal 
how little we still know, and act as a clarion call for further work. 


ALMUT ARNETH 


egetation and soil take up and release 

large amounts of carbon dioxide, and 

are thus key players in the climate sys- 
tem. Writing in Global Biogeochemical Cycles, 
Davies-Barnard et al.’ describe results from 
an Earth-system model that incorporates a 
dynamic component representing vegetation 
and the associated carbon cycle. They used 
this to investigate how change of vegetation 
in response to global warming and increasing 
atmospheric CO, levels compares, in terms 
of area and carbon uptake, with the effects of 
human land use — particularly deforestation 
and reforestation — over the coming decades. 
In total, the projected changes strongly 
depend on the type and location of future 
land-use change, and on the magnitude of 
climate change. 

In most parts of the world, humans have 
greatly altered the type of vegetation that dom- 
inates the landscape. Further large changes in 
land cover are expected as demand for food, 
timber and biofuels grows, and as the climate 
warms. Knowledge about the location of domi- 
nant vegetation cover in the future is needed 
for many reasons. Enhanced vegetation growth 
and the expansion of vegetation cover into 
new regions owing to climate change takes 
up CO, from the atmosphere, whereas large 
amounts of this greenhouse gas are lost from 
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vegetation and soil following deforestation. 
Changes in vegetation cover also alter the way 
in which incoming solar radiation is reflected 
or absorbed at the land surface, and how it 
subsequently warms the surface and leads to 
evaporation and transpiration of water. Taken 
together, vegetation and soil influence climate 
change globally and regionally. 

Despite this, only the most recent (fifth) 
report by the Intergovernmental Panel on 
Climate Change (IPCC) has simultaneously 
considered the effects of land-use change and 
of dynamically changing vegetation and soil 
processes — at least in a few simulations’ of 
future climate change using Earth-system 
models. By contrast, stand-alone vegetation 
models have been used for some years to 
assess the combined effects of natural vegeta- 
tion dynamics and deforestation’. However, 
climate scientists have rarely attempted to 
systematically disentangle the two, especially 
with respect to the area covered. 

In their modelling study, Davies-Barnard 
and colleagues show how three scenarios that 
consider both land-use change and climate 
change lead to substantial regional discrepan- 
cies in where, and by how much, vegetation 
cover expands or decreases (Fig. 1). The model 
shows that climate-change effects are larger 
in boreal forest than in tropical forest, and 
become more important towards the end of the 
twenty-first century. In fact, the one outcome 


that emerges from all three scenarios is the 
poleward expansion of boreal forest, a finding 
that has also been reported in previous work 
(see ref. 3, for example). By contrast, tropical 
forests are more affected by land-use change 
than are boreal ones, and the effects become 
evident in the next few decades, but the direc- 
tion and speed of change depends greatly on 
the scenario — for example, the ratio of the 
land area adopted for crop and pasture lands 
to the area of reforestation. 

Thus, a complex picture emerges in which 
changes in vegetation cover depend on the 
speed of vegetation’s response to human- 
induced forcing, whether warming and higher 
atmospheric CO, levels stimulate the expan- 
sion of forest cover, and the relative size of 
areas of deforestation and reforestation. To 
complicate matters further, the magnitude and 
direction of vegetation-area change and eco- 
system carbon changes are not proportional 
to each other. The regional differences asso- 
ciated with each scenario count, and not just 
because of their effects on climate. Changes in 
land cover will affect species and habitat diver- 
sity, but also water supplies, food provision, air 
quality and other services that society derives 
from ecosystems. A better understanding of 
where and when we can expect land-cover 
changes is therefore needed to develop sustain- 
able land-management strategies. 

As Davies-Barnard and co-workers note, 
there are several caveats to their analysis, 
some of which relate to the vegetation and 
carbon-cycle model used. In their study, the 
nitrogen and carbon cycles do not interact; 
such a lack of interaction can affect not only 
future carbon-cycle projections’, but also how 
simulated vegetation cover responds to climate 
and atmospheric CO, changes’. Furthermore, 
the representation of croplands is highly sim- 
plified in their model, and does not consider 
crop-management practices that are known to 
affect the carbon content of soil. 

Another caveat is that forest-management 
practices, the dynamics of forest regrowth 
and tree-age distributions are not accounted 
for in the authors’ model, but these are impor- 
tant for carbon cycling in ecosystems. And 
only net land-use changes — the net area that 
undergoes a change from one time period to 
the next — are considered, even though the 
accuracy of estimates of total land-use change 
and carbon-cycle calculations can be sub- 
stantially improved when the more-detailed, 
multidirectional changes that occur within a 
region are accounted for”. We do not know 
the degree to which Davies-Barnard and col- 
leagues’ results would be affected if all of these 
caveats were explicitly addressed. Their study 
will therefore stimulate and challenge scientists 
to account for land-use and land-cover change 
much more realistically than is done at present. 

Even more interesting is how much the 
study’s findings depend on the envisaged 
future world. In the fifth IPCC report, four 
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Figure 1 | Simulations of future forest 

cover. Davies-Barnard et al.' have used a 
computational model to investigate how the 
change of vegetation cover in response to 

global warming and increasing atmospheric 

CO, levels compares with the effects of land 

use (deforestation and reforestation) over the 
coming decades. The graphs depict changes in the 
percentage of the global land area covered by forest 
in 2100, using three different scenarios of climate 
change and land use; results from each scenario 
are shown in a different colour. The results differ 
greatly for each scenario. (Adapted from ref. 1.) 


future anthropogenic emission scenarios 
(known as representative concentration path- 
ways) were each realized by a different inte- 
grated assessment model, which combines 
knowledge about aspects of climate change 
and economics into a single framework. The 
uncertainties associated with projections 
of land-use change are therefore unknown, 
even though different outcomes of land-use 
change are feasible for each of the scenarios. 
However, the uncertainties in land-use change 
— in terms of the total area, location and direc- 
tion of change — will need to be considered 
to develop land-based policies for mitigating 
and adapting to the effects of climate change. 
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Scientists are addressing this issue by 
developing a broader range of land-use change 
projections, using different integrated assess- 
ment models, for each of the representative 
concentration pathways used in the IPCC 
report’. In addition, projections from global 
and regional models of land-use change that 
are conceptually different from integrated 
assessment models are emerging or are under 
development*’. We will soon be able to test 
how components of the future carbon cycle 
and the climate, and of many other crucial 
ecosystem properties, will alter when a range of 
CO, levels and climate changes are combined 
with various land-use-change scenarios. This 
will help us to answer the overarching question 
of how to share a finite resource: the land. = 


Almut Arneth is in the Department of 
Environmental Atmospheric Research, 
Institute of Meteorology and Climate 
Research, Karlsruhe Institute of Technology, 
Garmisch-Partenkirchen 82467, Germany. 
e-mail: almut.arneth@kit.edu 


1. Davies-Barnard, T., Valdes, P-J., Singarayer, J. S., 
Wiltshire, A. & Jones, C. D. Global Biogeochem. 
Cycles 29, 842-853 (2015). 

2. Ciais, P. et al. in Climate Change 2013: The Physical 
Science Basis. Contribution of Working Group | to the 
Fifth Assessment Report of the Intergovernmental 
Panel on Climate Change (eds Stocker, T. F. et al.) 
465-570 (Cambridge Univ. Press, 2013). 

3. Sitch, S. et al. Glob. Change Biol. 14, 2015-2039 
(2008). 

4. Warlind, D., Smith, B., Hickler, T. & Arneth, A. 
Biogeosciences 11, 6131-6146 (2014). 

5. Stocker, B. D., Feissli, F., Strassmann, K. M., Spahni, R. 
& Joos, F. Tellus B 66, 23188 (2014). 

6. Wilkenskjeld, S., Kloster, S., Pongratz, J., Raddatz, T. 
& Reick, C. H. Biogeosciences 11, 4817-4828 (2014). 

7. O'Neill, B. C. et al. Clim. Change 122, 387-400 
(2014). 

8. van Asselen, S. & Verburg, P. H. Glob. Change Biol. 
18, 3125-3148 (2012). 

9. Murray-Rust, D. et al. Environ. Model. Software 59, 
187-201 (2014). 

10.Arneth, A., Brown, C. & Rounsevell, M. D. A. Nature 
Clim. Change 4, 550-557 (2014). 


Ribosomal ties that bind 


The ribosome is the cellular complex of proteins and RNA molecules that synthesizes 
proteins. An artificial ribosome in which the two main subunits are tethered 
together creates opportunities for engineering this process. SEE LETTER P.119 


JOSEPH D. PUGLISI 


o engineer a system is to demonstrate 
a mastery of physical understanding. 
Mechanical engineers harness a deep 
understanding of fundamental physics to 
design new motors. Similarly, biologists are 
using the current explosion in information 
about molecular structure and function to engi- 
neer biological systems. The ribosome — the 
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macromolecular complex containing RNAs 
and proteins that translates the genetic code — 
represents one of nature’s most sophisticated 
machines. Engineering ribosomes would 
enable experimental manipulation of protein 
synthesis and provide deeper insights into 
cellular and molecular biology. On page 119 
of this issue, Orelle et al.' describe drastic, but 
simple, engineering of functional ribosomes, in 
which two separate subunits are linked as one. 
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Comprehensive genomic profiles 
of small cell lung cancer 


A list of authors and affiliations appears at the end of the paper 


We have sequenced the genomes of 110 small cell lung cancers (SCLC), one of the deadliest human cancers. In nearly all 
the tumours analysed we found bi-allelic inactivation of TP53 and RB1, sometimes by complex genomic rearrangements. 
Two tumours with wild-type RBI had evidence of chromothripsis leading to overexpression of cyclin D1 (encoded by the 
CCND1 gene), revealing an alternative mechanism of Rb1 deregulation. Thus, loss of the tumour suppressors TP53 and 
RB1 is obligatory in SCLC. We discovered somatic genomic rearrangements of TP73 that create an oncogenic version of 
this gene, TP734ex2/3. In rare cases, SCLC tumours exhibited kinase gene mutations, providing a possible therapeutic 
opportunity for individual patients. Finally, we observed inactivating mutations in NOTCH family genes in 25% of 
human SCLC. Accordingly, activation of Notch signalling in a pre-clinical SCLC mouse model strikingly reduced the 
number of tumours and extended the survival of the mutant mice. Furthermore, neuroendocrine gene expression was 
abrogated by Notch activity in SCLC cells. This first comprehensive study of somatic genome alterations in SCLC 
uncovers several key biological processes and identifies candidate therapeutic targets in this highly lethal form of cancer. 


Small cell lung cancer (SCLC) accounts for approximately 15% of all 
lung cancers, arises in heavy smokers, and the tumour cells express 
neuroendocrine markers. Although chemotherapy is initially effective 
in the treatment of SCLC, recurrence arises rapidly in the vast major- 
ity of cases, usually killing the patient within only a few months’. 
SCLC is rarely treated by surgery and few specimens are available 
for genomic characterization. Previous studies applying mostly 
exome sequencing in a limited number of tumour specimens have 
revealed only a few recurrently mutated genes””. 

We hypothesized that complex genomic rearrangements, which are 
undetectable by exome sequencing, might further contribute to the 
pathogenesis of SCLC and thus performed whole-genome sequencing 
of 110 human SCLC specimens (Supplementary Tables 1-4). One of 
the hallmarks of SCLC is the high frequency of mutations in TP53 and 
RBI (refs 2-7). As mice lacking Trp53 and Rb1 in the lung develop 
SCLC*”, we also sequenced 8 of these murine SCLC tumours in order 
to identify mutations that may promote SCLC development following 
loss of Trp53 and Rb1 and that may overlap with such accessory genes 
in human SCLC” (Supplementary Table 5). 


Samples and clinical data 


We collected 152 fresh-frozen clinical tumour specimens obtained 
from patients diagnosed with stage I-IV SCLC under institutional 
review board approval (Supplementary Table 1 and Extended Data 
Fig. 1). The tumour samples were enriched for earlier stages and 
consisted of primary lung (n = 148) and metastatic tumours (n = 4) 
obtained by surgical resection (n = 132), biopsy (n = 4), pleural effu- 
sion (n = 1) or through autopsy (m = 15). We performed whole-gen- 
ome sequencing on 110 of these tumours and their matched normal 
DNA. A total of 42 cases were excluded from the analysis because of 
insufficient quality or amount of DNA. Most of these 110 tumours 
were treatment-naive, with only five cases obtained at the time of 
relapse. We analysed transcriptome sequencing data in 71 of the 
110 specimens that had undergone genome sequencing and in 10 
additional specimens. Finally, 103 of the 110 genome-sequenced spe- 
cimens and 39 additional specimens were analysed by Affymetrix 6.0 
SNP arrays (Supplementary Table 1 and Extended Data Fig. 1). Eight 
tumour samples from preclinical SCLC mouse models were analysed 


by whole-exome sequencing (n = 6) or whole-genome sequencing 
(n = 2) (Supplementary Table 5). 


Recurrent somatic alterations in SCLC 


SCLC genomes exhibited extremely high mutation rates** of 8.62 
nonsynonomous mutations per million base pairs (Mb). C:G>A:T 
transversions were found in 28% of all mutations on average, a pattern 
indicative of heavy smoking (Fig. la and Supplementary Tables 2 
and 3). The smoking history or clinical stage of the tumours did not 
correlate with the type and number of mutations (Extended Data 
Fig. 2). The median tumour content was 84% (Extended Data 
Fig. 3a and Supplementary Table 2). By contrast, murine SCLC 
tumours showed a low number of somatic alterations (on average 
28.5 protein-altering mutations per sample on average)’® (Supple- 
mentary Table 5). 

In order to assess the amount of genetic heterogeneity of SCLC, we 
developed a subclonality score, which can be interpreted as the prob- 
ability that an arbitrary point mutation in a randomly selected cancer 
cell is subclonal throughout the entire tumour (Methods). A reliable 
reconstruction of the subclonal architecture was possible in 55 of the 
cases (Extended Data Fig. 3b). A comparison to lung adenocarci- 
noma" indicated a threefold lower subclonal diversity in SCLC 
(P = 0.00023, Extended Data Fig. 3b), pointing to pronounced differ- 
ences in the evolution of SCLC and lung adenocarcinoma’*’. In 
contrast to adenocarcinomas, the level of heterogeneity in SCLC did 
not correlate with clinical stage (Extended Data Fig. 2b). 

We applied several analytical filters in order to identify mutations 
with a probable relevance in SCLC biology in the context of the high 
load of background mutations” (Extended Data Fig. 1, Supplementary 
Table 6 and Methods). They include (I) analyses of significance 
determined by a comparison of observed and expected mutation 
rates followed by a correction for expressed genes, (II) a survey of 
regional clustering of mutations that may indicate mutational target- 
ing of functionally enriched areas in tumour suppressors or proto- 
oncogenes, (III) determination of genes that are enriched for likely 
damaging mutations, (IV) a comparison with genes whose biological 
relevance has been established in SCLC mouse models, and (V) a 
listing of genes with a likely therapeutic relevance or that are otherwise 
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Figure 1 | Genomic alterations in small cell lung cancer. a, Tumour samples 
are arranged from left to right. Alterations of SCLC candidate genes are 
annotated for each sample according to the colour panel below the image. The 
somatic mutation frequencies for each candidate gene are plotted on the 
right panel. Mutation rates and type of base-pair substitution are displayed 
in the top and bottom panel, respectively. Significant candidate genes are 
highlighted in bold (*corrected q-values < 0.05, fP < 0.05, {P < 0.01). The 


frequently affected by genetic alterations in human cancers (that is, 
genes in the Cancer Gene Census™* and COSMIC” database). 

Among the significantly mutated genes (I), (q-values < 0.05, 
Methods) were TP53 and RBI (refs 4-7), KIAA1211 and COL22A1, 
as well as RGS7 and FPR1, both of which are involved in G-protein- 
coupled receptor signalling (Fig. 1a). 

Locally clustered mutations (II) are indicative of functional selec- 
tion (P<0.05, Supplementary Table 6, Methods)*"®. Of all genes, 
Fig. 1a lists those alterations that occurred in more than 8% of the 
samples, were otherwise affected by recurrent genomic rearrange- 
ments (Supplementary Table 4), or were mutated in Trp53 ’, 
Rb1-’~ or Trp53.’-, Rb1 ‘~, Rbl2-’~ SCLC tumours arising in 
mice*? (Supplementary Table 5). Confirming previous results and 
our analytical strategy, the histone acetyltransferase genes CREBBP 
and EP300 exhibited significantly clustered mutations and recurrent 
inactivating translocations (Fig. la and Extended Data Fig. 3c)’. 
Furthermore, significant mutation clustering occurred in genes with 
functional roles in the centrosome (ASPM, ALMS1 and PDE4DIP), in 
the RNA-regulating gene XRNI and the tetraspanin gene PTGFRN; 
the latter was also mutated in murine SCLC (Extended Data Fig. 3c). 
The TP53 homologue TP73, which was also affected by recurrent 
somatic rearrangements (Fig. 1a), also showed clustered mutations. 

In the group of significantly damaged genes (III) we also found 
TP53, RB1, CREBBP and COL22Al1, further highlighting their likely 
biological relevance in SCLC. Additional inactivating mutations 
occurred in FMN2 and NOTCH1 (P<0.01). NOTCH family genes 
were recurrently mutated with a pattern of frequent inactivation. 
Notch3 was also mutated in a Trp53 ‘~ Rb1~’-,RbI2-’—_ mouse 
tumour (Fig. 1a, Supplementary Table 5 and Methods). 

Of the genes with an established role in murine SCLC (IV), we 
confirmed PTEN'®'’. RBL1 and RBL2, which are closely related 
to RBI (ref. 9), similarly exhibited inactivating translocations and 
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respective level of significance is displayed as a heatmap on the right panel. 
Genes that are also mutated in murine SCLC tumours are denoted with a 

§ symbol. Mutated cancer census genes of therapeutic relevance are denoted 
with a + symbol. b, Somatic copy number alterations determined for 142 
human SCLC tumours by single nucleotide polymorphism (SNP) arrays. 
Significant amplifications (red) and deletions (blue) were determined for the 
chromosomal regions and are plotted as q-values (significance < 0.05). 


mutations (Fig. la, Extended Data Fig. 3d and Supplementary 
Table 4). Mice with inactivation of Trp53, Rb1 and Rbi2 develop 
SCLC with shorter latency than mice lacking Trp53 and Rb1 alone’, 
thus validating RBL2 as another accessory tumour suppressor in SCLC. 

Given the lack of therapeutic options in SCLC, we sought mutations 
that are known oncogenic drivers in other cancers and sometimes 
associated with response to targeted drugs (V)'*"°. Of these, we found 
mutations in four tumours with a potential therapeutic implication, 
including mutations in BRAF"’, KIT?” and PIK3CA” (Extended Data 
Fig. 3e). Thus, genotyping of SCLC patients may reveal individual 
patients who might have a possible benefit from targeted therapeutic 
intervention. 

Across these five categories, mutations in CREBBP, EP300, TP73, 
RBL1, RBL2 and NOTCH family genes were largely mutually exclusive 
(Fig. la), suggesting that they may exert similar pro-tumorigenic 
functions in the development of SCLC. We did not observe significant 
correlations of global mutational signatures (for example, predom- 
inance of C:G>A:T transversions) with the mutational status of these 
genes (Extended Data Fig. 2b). Furthermore, mutations in these 
genes were not significantly associated with the total number of muta- 
tions, overall survival or other clinical parameters (Extended Data 
Fig. 4). The mutation status of 22 of the most frequently mutated 
genes was confirmed in an independent data set (Methods and 
Supplementary Table 7). 

By analysing somatic copy number alterations, we confirmed prev- 
iously known genomic losses within 3p pointing to focal events on 
3p14.3-3p14.2 (harbouring FHIT°) and 3p12.3-3p12.2 (harbouring 
ROBOI (ref. 22)) (Fig. 1b, Extended Data Fig. 5a and Supplementary 
Table 8)°’*??. FHIT expression was also reduced in cases with focal 
deletions (Extended Data Fig. 5b). In addition to homozygous 
losses in the CDKN2A locus (Extended Data Fig. 5c), amplification 
of the MYC family genes’, MYCL1, MYCN and MYC, as well as of the 
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Figure 2 | Universal bi-allelic inactivation of TP53 and RB1 in human 
SCLC. a, Alterations of TP53 and RB1 were determined based on whole- 
genome sequencing data of 108 SCLC cases. Samples are plotted from left to 
right. Alleles A and B are represented for each case and colour-coded according 
to the somatic alteration. The integral copy number (iCN) state of each 

allele is plotted; hemizygous losses are annotated as loss of heterozygosity 
(LOH), copy-neutral LOH or LOH at higher ploidy. Samples retaining allele A 
and B show alterations on both alleles (bi-allelic alterations). b, Circos plot 
of case 02297 showing intra- and interchromosomal translocations between 
chromosome 3 and 11. The copy number state of the respective chromosomal 
regions (iCN) is plotted as a heatmap. The genomic context of CCND1 (on 
chromosome 11) is highlighted. c, Significantly differentially expressed genes 
encoded on chromosome 11 are analysed in both chromothripsis cases in 
comparison to all other tumours. Positive and negative z-scores show 
upregulation and downregulation of genes, respectively (P < 0.05; *q-value 
<0.05). d, Distribution of CCND1 expression over 81 SCLC samples. 
Chromothripsis cases are highlighted in red. e, Haematoxylin and eosin (H&E) 
and immunohistochemistry staining for cyclin D1 and Rb1 for sample S02297. 
Original magnification, X 400. 


tyrosine kinase gene, FGFRI (refs 2, 24), and IRS2 were recurrent 
genomic events (Fig. 1b). Focal IRS2 amplifications occurred in 2% 
of the cases (Extended Data Fig. 5d, e). 


Universal inactivation of TP53 and RB1 


Inactivating mutations in TP53 and RBI have been shown to affect 
up to 90% and up to 65% of SCLC, respectively*’. By contrast, our 
whole-genome sequencing analyses revealed that both genes were 
altered in all but two cases that exhibited signs of chromothripsis* 
(Figs 1 and 2). TP53 and RBI alterations were mostly inactivating 
(Supplementary Table 9 and Extended Data Fig. 6a). Missense muta- 
tions in TP53 affected the functionally critical DNA binding domain, 
while RB1 was frequently altered by complex genomic translocations. 
Many mutations in RBI occurred at exon-intron junctions, which 
caused protein-damaging splice events as confirmed by transcriptome 
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sequencing (Extended Data Fig. 6b-e and Supplementary Tables 10 
and 11). In the 108 tumours without chromothripsis, TP53 and RB1 
had bi-allelic losses in 100% and 93% of the cases, respectively. 
Inactivating events included mutations, translocations, homozygous 
deletions, hemizygous losses, copy-neutral losses of heterozygosity 
(LOH) and LOH at higher ploidy (Fig. 2a, Extended Data Fig. 6 and 
Supplementary Table 9). Loss of CDKN2A occurred in cases with both 
bi-allelic inactivation of TP53 and RB1 and hemizygous loss of RB1 
(Fig. 2a). Although emerging data supports a continuum model of 
inactivation of tumour suppressors across multiple cancers, TP53 
and RB1 follow the classical discrete ‘two-hit paradigm’ pattern of 
Knudson-type tumour suppressors in SCLC”*’. 

The two tumours affected by chromothripsis displayed a similar 
pattern of massive genomic rearrangements between chromosomes 3 
and 11 (Fig. 2b and Extended Data Fig. 7a), but lacked shared fusion 
transcripts in the transcriptome sequencing data, suggesting that a 
particular fusion is not a common target (Extended Data Fig. 7b and 
Supplementary Table 12). Of the genes on chromosomes 3 and 11, 
CCND1 (encoding cyclin D1) was retained (Fig. 2b and Extended 
Data Fig. 7a) resulting in significant CCND1 overexpression in both 
tumours, but not in the other SCLC specimens (Fig. 2c, d and 
Supplementary Table 13). Immunohistochemistry confirmed high 
expression of cyclin D1 and a lack of nuclear Rb1 (Fig. 2e, Extended 
Data Fig. 7c). There were fewer proliferating Ki67-positive cells in 
these two cases. As cyclin D1 negatively regulates Rb family proteins”, 
these findings suggest that chromothripsis in cases with wild-type 
RB1 may compensate for genomic loss of RBI. 

Together, our findings provide evidence for the notion that com- 
plete genomic loss of both TP53 and RB1 function is obligatory in the 
pathogenesis of SCLC. 


Oncogenic genomic events affecting TP73 


We analysed the genome sequencing data for the presence of clustered 
chromosomal breakpoints that may indicate a common biological 
target (Supplementary Table 14)” and found 5 major clusters affect- 
ing RB1, as well as regions on chromosomes 1, 3 (3q26), 6 (affecting 
CDKALI) and 22 (Fig. 3a). Breakpoints in chromosome 22 caused 
inactivating translocations of TTC28 (Extended Data Fig. 8a)*°. 
Breakpoints also clustered downstream of the L1HS retrotransposon 
in SCLC, further supporting a role for this element in cancer*™’, 
(Extended Data Fig. 8b). Breakpoints on chromosomes 3, 6 and 22 
did not result in changes of expression of the affected genes 
(Supplementary Table 10). 

By contrast, genomic breakpoints affecting chromosome 1 clus- 
tered precisely in the TP73 locus in 7% of the cases (n = 8). To our 
surprise, several breakpoints were recurrently located in introns 1, 2 
and 3 of TP73. In two cases, breakpoints led to complex intrachro- 
mosomal rearrangements (Extended Data Fig. 8c and Supplementary 
Table 4), while the majority of breaks caused intragenic fusions and, 
thus, exclusion of either exon 2, or exons 2 and 3, which were all 
somatic (Fig. 3b and Extended Data Fig. 8d). Some rearrangements 
were copy-neutral events, while others occurred on the background of 
copy number gains (Extended Data Fig. 8e). One tumour sample 
revealed genomic exclusion of exon 10 (Fig. 3b). Analyses of tran- 
scriptome sequencing data confirmed that these rearrangements 
created the N-terminally truncated transcript variants p73Aex2 
and p73Aex2/3, as well as p73Aex10 (ref. 33) (Fig. 3c, d and 
Supplementary Table 11). Genomic validation and comparative pro- 
filing of transcript variants confirmed that p73Aex2/3 were not nat- 
urally occurring splice variants in SCLC and were only found in cases 
with genomic rearrangements (Fig. 3d and Supplementary Table 11). 
Some tumours expressed p73Aex2, in which we failed to identify 
genomic rearrangements (Supplementary Table 11). 

p73Aex2 and p73Aex2/3 lack a fully competent transactivation 
domain and are known tumour-derived variants of TP73 (ref. 33). 
p73 with N-terminal truncations has dominant negative functions 
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Figure 3 | Recurrent rearrangements generating oncogenic variants of 
TP73. a, Genomic breakpoints identified by whole-genome sequencing were 
mapped to their chromosomal locations. Recurrent breakpoints (n > 6 
samples) are highlighted in colours. b, Schematic representation of the TP73 
locus (hg19) illustrating intragenic translocations. Coding and non-coding 
regions of the annotated exons are shown as black and white boxes, respectively. 
c, Schematic representation of exons encoding p73, p73Aex2, p73Aex2/3 and 
p73Aex10. d, Exon skipping events were assessed in the transcriptome data 
of samples with genomic translocations resulting in p73Aex2, p73Aex2/3 and 
p73Aex10 transcript variants. S02139 served as a reference sample without 
TP73 alterations. The expression of uncommon exon combinations is 
highlighted in red. 
on wild-type p73 and p53, and is a confirmed oncogene in vivo**”. 
p73Aex10 results in an early stop codon; C-terminal truncations can 
similarly exert dominant-negative effects on wild-type p73 (ref. 33). 
Altogether, TP73 was somatically altered by mutations and geno- 
mic rearrangements in 13% of the cases (Fig. 1 and Extended Data 
Fig. 8c-e). To our knowledge, this is the first study describing 
p73Aex2/3 variants to emerge as a consequence of precise genomic 
rearrangements. 


Tumour suppressive roles of Notch in SCLC 

In an unsupervised hierarchical clustering analysis of transcriptome 
sequencing data (Methods), we observed two major clusters of SCLC 
tumours (Fig. 4a and Extended Data Fig. 9a). The majority (77%, 
n = 53/69) of tumours exhibited high expression of the neuroendo- 
crine markers CHGA (chromogranin A)’ and GRP (gastrin releasing 
peptide)’, had high levels of DLK1 (ref. 36), a non-canonical inhibitor 
of Notch signalling”’, and ASCL1, a lineage oncogene of neuroendo- 
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crine cells whose expression is inhibited by active Notch signalling 
(Extended Data Fig. 9b)’””*. The remaining cases (23%, n = 16/69) 
also expressed SYP (synaptophysin) or NCAM1 (CD56), thus con- 
firming that all tumours were of the typical SCLC subtype’ (Extended 
Data Fig. 9c). Furthermore, no significant difference in the distri- 
bution of the major known SCLC mutations (for example, TP53, 
RB1 or CREBBP) existed between the two transcriptional subtypes 
(Extended Data Fig. 9a). Thus, although all SCLC tumours shared the 
most frequent mutations as well as key neuroendocrine markers, the 
majority had a gene expression pattern suggestive of low Notch path- 
way activity (high levels of ASCL1 and DLK1). 

Mutations affected NOTCH family genes in both human and mur- 
ine SCLC (Fig. 1a and Supplementary Table 5). The mutations did not 
cluster significantly in any individual domains, but frequent dam- 
aging mutations occurred in the extracellular domain (Fig. 4b and 
Extended Data Fig. 10a), suggesting that NOTCH may be a tumour 
suppressor in SCLC. Overall, the NOTCH family was affected by 
genomic alterations in 25% of human SCLC (Fig. 1). 

Based on these observations and emerging evidence that activation 
of Notch signalling may inhibit the expansion of neuroendocrine 
tumour cells*’*°, we examined the consequences of Notch pathway 
activation in Trp53;Rb1;RbI2 conditional triple-knockout (TKO) 
mice’. We crossed Rosa26ho* ste? Lox Notch2ICD (7 st _N2ICD) mice that 
conditionally express an activated form of Notch2 (Notch2 intracel- 
lular domain, N2ICD) to TKO mice and found a significant reduction 
in the number of tumours that arose in the presence of N2ICD 
(P<0.001; Fig. 4c). Similar results were obtained upon activation 
of Notch], reflecting a general inhibition of SCLC initiation by active 
Notch signalling (Extended Data Fig. 10b). The recombination effi- 
ciency of an innocuous inducible reporter allele (Rosa26”""””"°) by Cre 
was much greater than that of the N2ICD allele, providing further 
support for a strong negative selection against active Notch signalling 
during SCLC development (Extended Data Fig. 10c-e). Importantly, 
the inhibitory effects of Notch observed in the early stages of tumor- 
igenesis correlated with a prolongation of survival of the mutant mice 
expressing N2ICD (Fig. 4d). Similarly, ectopic expression of NIICD 
in both mouse and human SCLC cell lines significantly inhibited their 
growth (Fig. 4e and Extended Data Fig. 10f). 

SCLC tumours in TKO mice showed typical patterns of neuroen- 
docrine differentiation with high expression of synaptophysin and 
Ascll. Consistent with the notion that Notch regulates neuroendo- 
crine differentiation in SCLC, overexpression of N2ICD resulted in 
the upregulation of Hes1 and abrogated expression of neuroendocrine 
markers (Extended Data Fig. 10g). Similarly, NIICD induced upre- 
gulation of Notch targets (for example, Hes1, Heyl, Hey2) in murine 
SCLC cells (Fig. 4f, Extended Data Fig. 10h and Supplementary Table 
15), as well as gene expression signatures consistent with cell cycle 
inhibition (Extended Data Fig. 10i). Ectopic expression of N1IICD 
inhibited cell cycle progression in murine and human SCLC cell lines 
(Extended Data Fig. 10j, k). This cell cycle inhibition is reminiscent of 
what has been seen in other contexts where Notch activation acts as a 
tumour suppressor*’. 

Altogether, our analyses involving genome and transcriptome 
sequencing of human and murine SCLC tumours, as well as studies 
in genetically manipulated mice, identify and validate Notch as a 
tumour suppressor and master regulator of neuroendocrine differ- 
entiation in SCLC. 


Discussion 


Here we provide a comprehensive analysis of somatic genome altera- 
tions in SCLC, identifying many novel candidate genes, some of which 
may have therapeutic implications. Such alterations with immediate 
therapeutic consequences are rare but present in SCLC (for example, 
in BRAF or KIT), suggesting that individual patients may benefit from 
genotyping and subsequent targeted kinase inhibitor therapy. We 
further discovered recurrent expression of p73Aex2/3 in SCLC and 
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Figure 4 | Notch is a tumour suppressor and a key regulator of 
neuroendocrine differentiation in SCLC. a, Unsupervised expression 
analysis of human SCLC tumours. Tumour samples are arranged in columns 
and grouped by the expression of differentially expressed genes (rows). 
Expression values are represented as a heatmap; yellow and blue indicate high 
and low expression, respectively. b, Schematic representation of NOTCH1 and 
NOTCH2. Somatic mutations are mapped to the respective protein domains. 
Damaging and missense mutations are highlighted in red and black, 
respectively. c, Representative H&E images of lungs from Trp53;Rb1;RbI2 
triple-knockout (TKO) or TKO;N2ICD (Notch2) mice collected 3 months after 
Ad-Cre instillation. Scale bar, 1 mm. Tumours were quantified for each 
genotype (n = 8). Statistical significance was determined by two-tailed 
unpaired Student’s t-test. d, Survival analysis of TKO (n = 7, median 


established a genetic mechanistic basis for this oncogenic variant. 
TP73Aex2/3 has recently been demonstrated to function as an onco- 
gene***> and therapeutic options were identified to restrict p73- 
dependent tumour growth in vivo, including in Trp53-deficient 
tumours”. Given the frequent occurrence of genomic TP73 altera- 
tions in SCLC, such approaches may potentially be promising in 
SCLC tumours. Our results furthermore provide proof for universal 
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Figure 5 | Signalling pathways recurrently affected in SCLC. Red and blue 
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survival = 210 days) and TKO;N2ICD (n = 8, median survival = 274 days) 
mice. Statistical significance was determined by log-rank test. e, Cell viability 
assay of the murine SCLC cell line KP1 transfected with a N1ICD (Notch1) 
expression plasmid or empty vector control (Ctrl) (3 independent biological 
replicas with 3 technical replicas each). Fold growth is normalized to day 0; 
representative images were taken on day 8. Scale bar, 50 tm. Statistical 
significance was determined by two-tailed paired Student’s t-test. f, Mouse 
SCLC cells were transfected with control or NIICD and analysed 48 h after 
transfection by gene expression microarrays. The heatmap describes 
differentially expressed genes in control or N1ICD-transfected cells (n = 3, 
each); red and green indicate high and low expression, respectively. *P < 0.05; 
**P < 0.01; ***P < 0.001. Data are represented as mean + s.d. 


bi-allelic inactivation of TP53 and RB1, thereby establishing these two 
genes as obligatory tumour suppressors in SCLC. 

Our genomic analyses also identified NOTCH family genes as 
tumour suppressors and master regulators of neuroendocrine 
differentiation in SCLC, and we validated this finding in vivo in a 
pre-clinical mouse model of this disease. Our observations may thus 
provide an initial link between Notch and the neuroendocrine pheno- 
type in SCLC. In contrast to the involvement of TP73 and NOTCH 
family genes (Fig. 5), the functional role of most of the other 
newly discovered genes (for example, KIAA1211, COL22A1, ASPM, 
PDE4DIP or PTGFRN) is much less clear. Although our analytical 
filters support their involvement in the tumour pathogenesis, func- 
tional experiments will be required to clarify their biological role. 

In summary, we have provided the first, to our knowledge, com- 
prehensive genomic analysis of SCLC, implicating several previously 
unknown genes and biological processes (Fig. 5) in the pathogenesis 
of this disease as possible targets for more efficacious targeted thera- 
peutic intervention against this deadly cancer. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Human lung tumour specimens. The institutional review board of the 
University of Cologne approved this study. We collected and analysed fresh- 
frozen tumour samples of 152 SCLC patients, which were provided by multiple 
collaborating institutions as fresh-frozen tissue specimen, frozen sections or as 
genomic DNA extracted from fresh-frozen material (Extended Data Fig. 1). 
Human tumour samples were obtained from patients under IRB-approved pro- 
tocols following written informed consent. 

The fresh-frozen SCLC samples were primary tumours diagnosed as stage I-IV 
tumours, and snap-frozen after tissue sampling. All tumour samples were patho- 
logically assessed to have a purity of at least 60% and no extensive signs of 
necrosis. Additionally, these tumour samples were reviewed by at least two inde- 
pendent expert pathologists and the diagnosis of SCLC was histomorphologically 
confirmed by H&E staining and immunohistochemistry for chromogranin A, 
synaptophysin, CD56 and Ki67. Matching normal material was provided in the 
form of EDTA-anticoagulated blood or adjacent non-tumorigenic lung tissue 
(Supplementary Table 1). The matched normal tissue was confirmed to be free 
of tumour contaminants by pathological assessment. Furthermore, tumour and 
matching normal material were confirmed to be acquired from the same patient 
by short tandem repeat (STR) analysis conducted at the Institute of Legal 
Medicine at the University of Cologne (Germany), or confirmed by subsequent 
SNP 6.0 array and sequencing analyses. Patient material was stored at —80 °C. 

Whole-genome sequencing was performed on 110 SCLC fresh-frozen tumour 
samples and matched normal material. Additionally, we analysed RNA-seq data 
of 81 SCLC primary tumours (Extended Data Fig. 1 and Supplementary Table 1), 
among which 20 cases were previously published**’. Furthermore, we studied the 
copy-number alterations of a total of 142 fresh-frozen tumour specimen by 
Affymetrix SNP 6.0, among which 74 cases were described before™*. 

Clinical correlation studies were performed with the study cohort of 110 SCLC 
patients considering age of diagnosis, gender, tumour stage, surgery, treatment 
with chemotherapeutics, smoking status, smoking history and overall survival 
(Extended Data Figs 2 and 4 and Supplementary Table 1). The median follow-up 
time for this cohort of 110 SCLC patients was 69 months, and 31% of the patients 
were alive at the time of last follow-up (Extended Data Fig. 2a and Supplementary 
Table 1). Smoking status was available for 88% (n = 97) of the patients; 63% 
(n = 69) reported a smoking history amounting to a median of 45 pack-years. 
Patients with a known smoking history were further subcategorized to heavy 
smokers (>30 pack-years), average smokers (10-30 pack-years) and light/never 
smokers (<10 pack-years). 

Primary findings on somatic mutations were further studied in a second inde- 

pendent cohort consisting of 112 SCLC cases. This validation cohort refers to the 
exome sequencing data of 28 fresh-frozen SCLC primary tumours and 9 SCLC 
cell lines** which were re-analysed in this present study (Supplementary Table 7). 
Additionally, we performed targeted sequencing on 8 fresh-frozen and 67 form- 
alin fixed paraffin embedded (FFPE) samples from SCLC patients 
(Supplementary Table 1). 
Mouse SCLC models and tumour samples. Mice were maintained according to 
practices prescribed by the NIH (Bethesda, MD) at Stanford’s Research Animal 
Facility, accredited by the Association for the Assessment and Accreditation of 
Laboratory Animal Care (AAALAC). The Trp53;Rb1 double-knockout (DKO) 
and the Trp53;Rb1;RbI2 triple-knockout (TKO) mouse models for SCLC have 
been previously described*’. Mice were bred onto a mixed genetic background 
composed of C57BL/6, 129/SvJ and 129/SvOla. SCLC tumours were induced in 
8-week-old mice by intratracheal instillation with 4 X 10’ plaque-forming units 
(p.f.u.) of adenovirus expressing the Cre recombinase (Ad-Cre, Baylor College of 
Medicine, Houston, TX). 

Whole-genome and whole-exome sequencing was performed on 8 murine 
SCLC tumours isolated from DKO and TKO mice. Primary tumours and meta- 
stases were dissected, snap-frozen, and stored at —80 °C. The material was patho- 
logically confirmed to have a tumour content of at least 90%. The respective tail 
tissue was similarly processed and served as a normal reference for 6 tumour 
samples (Supplementary Table 5). Average mutation rates were calculated for 
cases with tumour-normal pairs (n = 6). 

SCLC tumours expressing the activated intracellular domain (ICD) of 
Notchl (Notchl ICD, NIICD) and Notch2 (Notch2 ICD, N2ICD) were 
analysed in mouse models. Rosa26'?**!¢? -Lox-NotchlICD (7 g7_NIICD) or 
Rosa2erer ser teNotch21CP_ (7 ST NICD) mice were obtained from Spyros 
Artavanis-Tsakonas and Exelixis. These mice are similar to recently published 
Rosa26*/'S!-Noteh3ICP ices, Rosa26t tS NUCP or Rosa26t!'S'-NICP mice were 
crossed with TKO mice. TKO or TKO;Rosa26*/S!-NICP mice were infected with 
Ad-Cre at week 8 and their survival was monitored. The sample size was chosen 
based on our experience with these mouse models of cancer (a minimum of 3-5 
mice usually ensures statistical significance if the phenotypes are robust). We used 


both males and females in these experiments, littermates served as controls. 
Tumour initiation was studied three months after Ad-Cre instillation. The lungs 
were fixed and tumour burden was quantified using ImageJ software. To control 
for the efficiency of deletion, we also crossed TKO mice to Rosa26""”""° reporter 
mice**. For all tumour quantifications, the investigator was blinded to the geno- 
types when the H&E pictures were taken, and during the quantification of tumour 
number and area. No samples or animals were excluded from the analyses, and no 
randomization was performed. 

DNA and RNA extractions. Nucleic acids were extracted from fresh-frozen 
tissue specimen which were processed to 15-30 sections each of 20 um thickness 
at a cryostat maintaining a temperature of —20 °C (Leica). In the case of FFPE 
samples, 6-10 sections of 10 tim thickness were prepared. 

DNA was extracted from fresh-frozen tissues, EDTA blood, or FFPE samples 
using the Gentra Puregene DNA extraction kit (Qiagen) following the protocol of 
the manufacturer. DNA isolates were hydrated in TE-buffer and confirmed to be 
of high molecular weight (>10 kb) by agarose gel electrophoresis. Genomic DNA 
from fresh-frozen samples with evident signs of degradation were excluded from 
further sequencing studies. 

For RNA extractions, tissue sections were first lysed and homogenized with the 
Tissue Lyzer (Qiagen). Subsequent RNA extractions were performed with the 
Qiagen RNAeasy Mini Kit according to the instructions of the manufacturer. The 
RNA quality was assessed at the Bioanalyzer 2100 DNA Chip 7500 (Agilent 
Technologies) and samples with a RNA integrity number (RIN) of over 7 were 
further analysed by RNA-seq. 

Next-generation sequencing. All sequencing reactions were performed on an 
Illumina HiSeq 2000 instrument (Illumina, San Diego, CA, USA). 
Whole-genome sequencing. Whole-genome sequencing was performed with 
DNA extracted from fresh-frozen tumour and normal material. Short insert 
DNA libraries were prepared with the TruSeq DNA PCRfree sample preparation 
kit (Illumina) for paired-end sequencing at a minimum read length of 2 X 100 bp. 
Human DNA libraries were sequenced with the aim to obtain a coverage of 
minimum 30X for both tumour and matched normal. Murine DNA libraries 
of tumour and matched normal were both sequenced to a coverage of 25X. 
Whole-exome sequencing. Whole-exome sequencing was performed on fresh- 
frozen tissue specimen from mice. The enrichment for the exome was performed 
with the SureSelectXT Mouse All Exon kit (Agilent) following the protocol of the 
manufacturer. The exon-enriched libraries were subjected to paired-end sequen- 
cing with a read-length of 2 x 100bp. Both tumour and normal material was 
sequenced to a minimum coverage of 60X. 

RNA-sequencing. RNA-sequencing (RNA-seq) was performed with RNA 
extracted from fresh-frozen human tumour tissue samples. cDNA libraries were 
prepared from poly(A) selected RNA applying the Illumina TruSeq protocol for 
mRNA. The libraries were then sequenced with a 2 X 100 bp paired-end protocol 
to a minimum mean coverage of 30X of the annotated transcriptome. 

Targeted enrichment sequencing. Targeted enrichment sequencing was per- 
formed on human FFPE and fresh-frozen tumour and normal specimen for 
the purpose of validating genome alterations in an independent cohort. The 
custom probe design was constructed with SureDesign (Agilent Technologies) 
enriching for the exons of 22 genes of interest. DNA libraries were prepared with 
the SureSelect XT reagent kit according to the manufacturer’s instructions 
(Agilent Technologies) and sequenced with the aim to obtain a coverage of at 
least 200X. 

Dideoxy sequencing for validation of somatic alterations. If available, RNA-seq 
or exome sequencing was used to validate somatic mutations determined by 
genome sequencing. Alternatively, dideoxynucleotide chain termination sequen- 
cing (Sanger sequencing) was performed to validate mutations, genomic rearran- 
gements, and chimaeric fusion transcripts. Primer pairs were designed to amplify 
the target region encompassing the somatic alteration. The PCR reactions were 
performed either with genomic DNA, whole-genome amplified DNA or cDNA. 
The amplified products were subjected to Sanger sequencing and the respective 
electropherogram was analysed with Geneious (http://www.geneious.com). 
Copy number analysis by Affymetrix SNP 6.0 arrays. Human DNA extracted 
from fresh-frozen tumour specimen was hybridized to Affymetrix Genome-Wide 
Human SNP array 6.0 following the manufacturer’s instructions. The signal 
intensities were processed to analyse for chromosomal gene copy number data. 
Raw copy number signals and segmented copy number data were computed 
following the procedure described previously”. 

The raw, unsegmented copy number signals were used to analyse for signifi- 
cant copy number alterations applying the method CGARS”. Significant ampli- 
fications were determined with the upper quantiles 0.25, 0.15, 0.1, and 0.05; 
deletions were computed in reference to the 0.25 lower quantile. The significance 
threshold was set at a q-value of 0.05 (Supplementary Table 8). 
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Data processing. The raw sequencing reads of human and mouse samples 
acquired from whole-genome, whole-exome or targeted enrichment sequencing 
were aligned to the respective human (NCBI37/hg19) or mouse reference genome 
(NCBI37/mm9). The alignment was performed with the BWA aligner* (version 
0.6.1-r104). Concordant read-pairs were identified as potential PCR duplicates 
and were subsequently masked in the alignment file. The quality of the sequen- 
cing data was determined and is summarized in Supplementary Table 2. 

The whole-genome sequencing data of human samples was analysed for purity 
and ploidy with methods previously described” (Extended Data Fig. 3a and 
Supplementary Table 2). 

Somatic mutations and copy number alterations were determined with our in- 
house analysis pipeline**’. The calling of somatic mutations in human samples 
was further improved by filtering the identified variant against the sequencing 
data of more than 500 normal samples (including exome or genome sequencing 
data). Additionally, an estimation of human DNA library contamination was 
implemented to enhance sensitivity and specificity of mutation calling. 
Analysis of significantly mutated and biologically relevant genes. The signifi- 
cance of recurrently mutated genes was analysed for the whole-genome sequen- 
cing data set of 110 human SCLC samples (Extended Data Fig. 1a). 

As previously described’, the analysis first estimated the background mutation 
rate for each gene and corrected for its expression by referring to the RNA-seq 
data of 81 human primary SCLC tumour specimen analysed in the present study. 
The analysis included those genes which had FPKM values (fragments per kilo- 
base of exon per million fragments mapped) of over 1 in at least 50 samples. 
Following corrections for the occurrence of synonymous mutations, significantly 
mutated genes were determined with q-values of <0.05 (Fig. la, Extended Data 
Fig. 1a (filter I) and Supplementary Table 6). 

Mutations that cluster within a gene are defined as a mutational hotspot similar 
to our previously described method’. Here we used an analytical derivation of the 
test statistics, rather than resampling. To this end, the mutated positions are 
rescaled to lie within zero and one (using the protein length). Under the null 
hypothesis of having no particular mutational hotspot, the rescaled mutated posi- 
tions are uniformly distributed between zero and one, thus its expected value is 0.5. 
We therefore chose the final statistics as sum over the modulus of the rescaled 
position minus 0.5. This allows that the distribution under the null hypothesis can 
analytically be calculated; hence, also the P values. The analysis was calculated for 
genes that were significantly mutated in at least 5% of the samples with P< 0.05 
(Fig. 1a, Extended Data Fig. 1a (filter II) and Supplementary Table 6). In order to 
further filter for the genes of relevance, subsequent analysis considered those genes 
recurrently mutated in more than 8% (n > 8) samples. The called genes were scored 
for their relevance by either analysing recurrent translocations affecting these genes 
(Supplementary Table 4) or by comparison with the mouse SCLC mutation data to 
identify alterations in common genes (Supplementary Table 5). 

Additionally, recurrent mutations were scored for the accumulation of clearly 
damaging mutations in which splice site, frameshift and nonsense mutations 
were considered as damaging mutations. Here, we restricted the aforementioned 
significance analysis only to this class of mutations (by restricting the background 
mutation rate only to damaging mutations) and determined significance at 
P<0.01 (Fig. 1a, Extended Data Fig. 1a (filter III) and Supplementary Table 6). 

Genetic alterations were further scored for their relevance by comparison with 
genes that were functionally characterized in genetically engineered mouse models 
(GEMM) for SCLC”"®, or by comparing somatic mutations in SCLC with muta- 
tions in other cancer types reported in the Cancer Gene Census"* and in COSMIC 
(catalogue of somatic mutations in cancer)’ (Fig. 1a, Extended Data Fig. 1a (filter 
IV and V) and Supplementary Table 6). Additionally, the sequencing data of mouse 
SCLC specimen was used to identify alterations in common genes. 

Analysis of subclonal architecture. To determine the subclonal architecture 
from genome sequencing data, we first computed the cancer cell fraction (CCF; 
that is, the fraction of cancer cells carrying a particular mutation) of each called 
somatic point mutation. To this end, we first estimated the tumour purity, absolute 
copy numbers, and subclonal copy number changes using our previously described 
method? and computed for each mutation the expected allelic fraction under 
clonality assumption. The quotient between the observed allelic fraction of a muta- 
tion with its corresponding expected allelic fraction then yields the CCF. To assess 
the clonal and subclonal populations we next identified distinct clusters in the CCF 
profile and assigned each mutation to the cluster of highest probability. In order to 
provide a measure for the subclonal architecture, we proposed the following score: 
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where i = 0 represents the clonal population, i = 1,...,2c the subclonal populations; 
gj; is the CCF of each population (thus, ~ ~ 1), and m; is the number of mutations 
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assigned to cluster i. This subclonality score can be interpreted as the probability 
that a randomly selected mutation present in a single cancer cell is subclonal 
throughout the entire tumour. 

As a low sequencing depth limits the robust identification of subclonal popula- 
tions, we computed the genome-wide average contribution of a single mutated 
read to the CCF. For a given tumour purity p, average ploidy 2, and mean 
coverage c, this measure is given by: 


21—p) tpn 
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The smaller the average increase of CCF per read, the more accurately the sub- 
clonality score can be determined since more subclonal mutations can be called 
from the sequencing data. In this study, the most limiting factor for assessing the 
subclonal diversity is the relatively low sequencing depth (35X on average). We 
therefore used this measure to select the samples that are suitable for a reliable 
calculation of the subclonality score. To this end, we systematically scanned from 
the average increase of CCF per read form large to small values and detected the 
point of the most prominent change in the distribution of the subclonality score 
(Supplementary Table 2). 
Analysis of genomic breakpoints. Genomic rearrangements were reconstructed 
from the whole-genome sequencing data of 110 human SCLC samples following 
the procedure as previously described*’. The genomic rearrangements called from 
each tumour sample were further filtered against a library of 110 normal genomes 
to thus minimize the detection of false-positive rearrangements. Genomic break- 
points of SCLC candidate driver genes are listed in Supplementary Table 4. 

The genomic breakpoints of all samples were mapped to their chromosomal 

locations and recurrent breakpoints clustering within the range of 100 kb 
were identified with a similar approach described previously” (Supplementary 
Table 14). 
Processing and analysis of RNA-seq data. RNA-seq data was processed as 
previously described*”’ to detect chimaeric transcripts and to determine the 
transcriptional abundance of annotated transcript variants. In brief, paired-end 
RNA-seq reads were mapped to the human reference genome (NCBI37/hg19) 
using GSNAP. Potential chimaeric fusion transcripts were identified by discord- 
ant read pairs and by individual reads mapping to distinct chromosomal loca- 
tions. The sequence context of rearranged transcripts was reconstructed around 
the identified breakpoint and the assembled fusion transcript was then aligned to 
the human reference genome to determine the genes involved in the fusion. 

Cufflinks was used to determine the expression levels of annotated transcripts 
referring to unique paired-end reads which align within the expected mapping 
distance. The expression is represented as FPKM values (Supplementary Table 10). 
Transcript splicing analysis. RNA-seq data was used to analyse for alternative 
splicing events of TP53, RBI and TP73 caused by exon skipping or intron reten- 
tion (Supplementary Table 11). The paired-end reads were mapped to the ref- 
erence genome (hg19) using STAR mapper. In reference to the annotation of 
exon junctions provided from UCSC genes and RefSeq the following parameters 
were applied: ref 1, options:—alignIntronMin 20,-alignIntronMax 500000,- 
outFilterMismatchNmax 10, and-chimSegmentMin 10. The coordinates of reads 
potentially crossing exon boundaries were derived from the respective 
“SJ.out.tab” file and compared to the reference annotation. Subsequently, junc- 
tion read counts were assigned to all transcripts containing the respective exon 
combination. If the exon combination is novel, read counts were assigned to those 
transcripts sharing one of the exons contributing to the novel junction. For 
subsequent analyses the transcript with the highest number of junction read 
counts was used as a reference. Additionally, for exon combinations unique to 
alternative transcripts a representative transcript was selected based on total read 
counts. The read counts of each exon junction were normalized to the reads per 
kilobase per million mapped reads (RPKM) per sample. These expression values 
were further normalized per gene by dividing by the average expression of the 
exons of the reference transcript. Potentially novel exon combinations were 
rejected if the average expression of the reference transcript was <2 or if their 
expression were <10% of the reference transcript expression. 

Differential expression for outlier studies. Differential gene expression analysis 
was performed to compare the transcriptional profile of the two chromothripsis 
cases (S02297 and S02353) with other non-chromothripsis SCLC cases and to 
thus identify outliers in the expression profile. The expression was analysed 
by computing z-scores for all samples referring to the RPKM values and using 
the R function ‘scale’; RPKM values smaller than 3 were set to 0. In order to 
prioritize for genes differentially expressed in the two samples S02297 and 
S02353, genes were ranked by their respective z-scores. Statistical testing was 
performed for genes on Chr 3 and Chr 11, respectively. The P values were then 
combined from the two samples using Fisher’s method and corrected for multiple 
hypothesis testing by using the Benjamini-Hochberg approach. Differentially 


Average increase of CCF per read = 
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expressed genes with a P<0.05 and q-values <0.01 are provided in Supple- 
mentary Table 13. 

Unsupervised expression clustering. Unsupervised clustering was performed 
with RNA-seq data of 69 SCLC cases for which matching genome sequencing 
data was available (Fig. 4a and Extended Data Fig. 9a). As expression values are 
approximately following log-normal distribution, we transformed raw FPKM of 
each transcript by logo(1+FPKM). The resulting expression scores were then 
searched for a high and low expression characteristic over the samples. To this 
end, expression scores of each transcript were divided into two states using 
k-means clustering. To prevent an accumulation of artificial signals, only tran- 
scripts with at least 6 samples in each state and having a state-averaged fold 
change larger than 3 are considered for further analysis. A t-test is then computed 
between the two states of the remaining transcripts and corrected for multiple 
hypothesis testing using the false discovery rate framework. Next, transcripts 
having a q-value smaller than 0.01 were selected. For genes with multiple tran- 
script variants, the transcript with the smallest q-value was chosen as represent- 
ative transcript. Then, invariant genes were removed (having a standard deviation 
across all samples <2). To improve clustering, only genes that share a similar 
pattern of the two states in at least 6 other genes are finally selected (using a 
Fisher’s exact test with a significance threshold of 10°). Using the determined list 
of transcripts/genes, hierarchical clustering (Euclidean distance, complete link- 
age) was performed on the raw expression scores. 

IRS2 amplification FISH assay. A fluorescence in situ hybridization (FISH) 
assay was used to detect and confirm JRS2 amplifications at the chromosomal 
level. We performed a signal detection approach, with two probes on chro- 
mosome 13: the reference probe is located on the centromeric region of chro- 
mosome 13 (Empire Genomics, Art.Nr.: CHR13-10-GR) and was labelled with 
green 5-fluorescein dUTP to produce a green signal; the target probe is located on 
the IRS2 locus spanning 13q33.3-34 and was labelled with biotin to produce a red 
signal using the CTD-2083015 BAC clone (Life Technologies, CA, USA). As 
previously described”, slides of FFPE and fresh-frozen samples of tumour tissues 
were prepared, stained and analysed at a fluorescence microscope (Zeiss, Jena, 
Germany) with a 63X oil immersion objective. A non-amplified nucleus showed 
one red target signal for every corresponding green reference signal, with a red/ 
green ratio of 1:1 (Extended Data Fig. 5e). High-level amplifications were deter- 
mined for at least 10 red signals. In some cases the red signals were observed as 
clusters in the cells. At least 100 nuclei per case were evaluated. 
Immunohistochemistry. Immunohistochemistry was performed on human 
tumour FFPE samples to analyse for the protein expression of Rb, cyclin D1, 
p53, p14 (ARF), and p16. The staining was performed with the BenchMark XT 
automated immunohistochemistry slide staining system (Roche). The following 
antibodies and conditions were applied: Rb (C-15) rabbit polyclonal (Santa Cruz; 
FFPE retrieving conditions: 60 min at pH 6.0; dilution: 1:500; incubation: 60 min, 
37 °C); Cyclin D1 clone SP4 rabbit monoclonal (Microm France; FFPE retrieving 
conditions: 90 min at pH 8.4; dilution: 1:200; incubation: 60 min, 37 °C); p53 clone 
DO7 mouse monoclonal (Dako; FFPE retrieving conditions: 60 min at pH 8.4; 
dilution: 1:25; incubation: 60 min, 37 °C); p14 ARF clone 4C6/4 mouse monoclonal 
(Cell Signaling; FFPE retrieving conditions: 60 min, water-bath 98 °C at pH 6.0; 
manual immunohistochemistry staining with Novolink Max polymer detection 
system (Leica); dilution: 1:4,000; incubation: overnight, 4 °C); p16 INK4 Ab-7 clone 
PO7 mouse monoclonal (Neomarkers; FFPE retrieving conditions: 60 min at pH 
8.4; dilution: 1:800; incubation: 60 min, room temperature). 

For immunohistochemistry on mouse tumour FFPE samples, sections were 
permeabilized for antigen retrieval by microwaving in a citrate-based antigen 
unmasking solution (Vector Laboratories). The following antibodies were used: 
GFP (Invitrogen; A11122; dilution: 1:400), RFP/Tomato (Rockland Immuno- 
chemicals; 600-401-379; dilution: 1:500), Notch2 (Cell Signaling; 5732; dilution: 
1:200), Hes1 (Cell Signaling; 11988; dilution: 1:200), Ascll (BD Biosciences; 
556604; dilution: 1:200) and Synaptophysin (Neuromics; MO20000; dilution: 
1:200). Sections were developed with DAB (Vector Labs) and counterstained with 
haematoxylin. 


Cell lines, tissue culture and transfections. Mouse (KP1) and human (NJH29, 
NCI-H82 and NCI-H187) SCLC cell lines were grown in RPMI-1640 media 
supplemented with 10% bovine growth serum (BGS) (Fisher Scientific) and 
penicillin-streptomycin-glutamine (Gibco), as described before’. KP1, NJH29 
were generated at Stanford. NCI-H82 and NCI-H187 were purchased from 
ATCC. These cells grow as suspension spheres or aggregates in culture. All cell 
lines were maintained at 37°C in a humidified chamber with 5% CO3. All cell 
lines tested negative for mycoplasma infection. For transient expression of 
Notch ICD, cells were trypsinized and transfected with either MigR1-IRES- 
GFP (Ctrl) or MigR1-Notch1-ICD-IRES-GFP (NICD) using Lipofectamine 
2000 (Life Technologies). The plasmids were gifts from W.S. Pear (University 
of Pennsylvania, Philadelphia). Then 48 h after transfection, cells were trypsi- 
nized and resuspended in phosphate-buffered saline (PBS) containing 10% BGS 
and 1 pg ml ' 7-aminoactinomycin D (Life Technologies) that labels dead cells. 
Live GFP* cells were then sorted for subsequent experiments using a BD 
FACSAria fluorescence-activated cell sorting (FACS) machine. 
Gene expression and microarray analysis. Gene expression and microarray 
analyses were performed with the mouse cell line KP1 transiently transfected 
with MigR1-IRES-GFP (Ctrl) or MigR1-Notch1-ICD-IRES-GFP (N1ICD). 
Then 1 X 10° GFP™ cells were sorted and the RNA isolated using the AllPrep 
DNA/RNA micro kit (Qiagen). RNA quality assessment using the 2100 
Bioanalyzer (Agilent) as well as the subsequent cDNA preparation steps for 
microarray analysis were performed at the Stanford Protein and Nucleic Acid 
(PAN) facility using the GeneChip Mouse Gene 2.0 ST Array (Affymetrix). For 
gene expression analysis, the Robust Multichip Average (RMA) Express 1.0.4 
program was used for background adjustment and quantile RMA normalization 
of the 41,345 probe sets encoding mouse genome transcripts. Linear models for 
microarray data (LIMMA) was used to compare Ctrl or NIICD samples on RMA 
normalized signal intensities. Only genes with an adjusted P value of 0.05 or less 
were considered as significantly differentially expressed. A total of 769 probes 
accounting for 760 genes were significant, and the expression levels of these genes 
were represented as a heatmap using the heatmap.2 function in R. The analysis 
was performed in triplicates. A list of significant genes is provided in 
Supplementary Table 15. 
MTT cell viability assay. Sorted GFP™ cells were seeded at 1 X 10* per well in 96- 
well plates. The MTT reagents (Roche) were added on days 0, 2, 4, 6 and 8 for 
mouse SCLC cell lines or on days 0, 2, 4 and 6 for the human SCLC cell line 
NJH29. The absorbance wavelength was 570 nm with a reference wavelength of 
650 nm. 
EdU incorporation assay. Transfected cells were treated with 10 uM EdU (5- 
ethynyl-2’-deoxyuridine) (Life Technologies) for 3 h before trypsinization for 
FACS. 1 X 10° live, GEP* cells were sorted and labelled with EdU using the Click- 
iT EdU Pacific Blue flow cytometry assay kit (Life Technologies). Cells were then 
run through the BD FACSAria to analyse for per cent EdU incorporation. 
Data reporting. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | Genomic analyses in SCLC tumours. a, Schematic detailing the genomic study and number of samples as well as various steps of 
analyses for the identification of candidate genes in SCLC. b, Illustration of the number of samples analysed in this study. 
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Extended Data Figure 2 | Clinical molecular-correlation analyses. 

a, Survival analysis of SCLC patients based on clinical stage and treatment 
options (surgery and/or chemotherapy). Statistical significance was determined 
by log-rank test. b, Analyses of clinical stage and smoking status and the 


non-synonymous mutations (n=69) p-value: 0.164 


C:G>A:T transversion rate (n=69) p-value: 0.737 


* Kruskal-Wallis test 


respective effect on number and type of mutations, as well as mutational 
subclonality in tumours. Statistical significance was determined by Kruskal- 
Wallis analysis. 
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Extended Data Figure 3 | Genomic characterization of SCLC tumours. 

a, Purity and ploidy determined in SCLC tumours by whole-genome 
sequencing presented as dot density plots showing median and the interquartile 
range (IQR) b, Subclonal architecture of SCLC in comparison to lung 
adenocarcinoma (AD). Whole-genome sequencing data of SCLC and of 
adenocarcinoma (n = 15)"' was analysed for the presence of subclonal 
populations using clustering of the derived cancer cell fraction (CCF) of all 
single nucleotide mutations. To compare the emerging subclonal structure, we 
derived a subclonality score that takes into account the CCF of each sub- 
population as well as its mutational burden (see Methods). In order to prevent 
the low sequencing coverage (35 for SCLC and 63X for AD) from causing a 
systematic underrepresentation of the subclonal diversity in the mutation 
calls, we computed the contribution of a single read to the CCF on 


genome-wide average. After systematically determining a threshold within the 
average increase of CCF per read values (see Methods for details), we 
determined the group of samples for which a reliable estimation of the 
subclonality score is not possible (grey area). The subclonality scores of the 
remaining SCLC cases were then compared to those of the adenocarcinoma 
cases (P = 0.000232; Mann-Whitney test). c, Schematic representation of 
candidate genes with significant clustering of mutations in respective protein 
domains. Somatic mutations and genomic translocations are mapped to the 
respective protein regions. Hotspot mutations are highlighted in red. 

d, e, Genomic alterations in the RBI family proteins p107 (RBL1) and p130 
(RBL2) (d), and in KIT and PIK3CA (e). Somatic mutations in therapeutic 
target genes are listed and mapped to the protein domains of KIT and PIK3CA. 
Mutations with potential therapeutic implications are highlighted in red. 
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Extended Data Figure 4 | Clinical molecular-correlations of significantly determined by log-rank test. b, Analysis of CREBBP/EP300, TP73 and NOTCH 
mutated genes. a, Survival analysis of SCLC patients based on the status of alterations and their effect on clinical and genetic parameters. Statistical 
CREBBP/EP300, TP73 or NOTCH alterations. Statistical significance was significance was analysed by multinomial logistic regression. 
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Extended Data Figure 5 | Significant somatic copy number alterations in The copy number (CN) states were computed from SNP array (SNP 6.0) 
SCLC. a, Deletions of the chromosomal arm 3p point to the 3p14 (FHIT) and _and from whole-genome sequencing (WGS) data. The samples are sorted 
3p12 (ROBOI1) locus. b, Expression analyses of genes encoded on the 3p14.3- _ according to their amplitude of deletions or amplifications. e, Amplifications of 
3p14.2 and 3p12.2-3p12.2 locus. Histogram displaying the expression of IRS2 were determined by FISH analysis. IRS amplifications were quantified 
samples with focal deletions (blue) and samples without any copy number based on the ratio of red signals (IRS2-specific probe) to green signals 
alterations (white). Mean and standard error of the mean is plotted for each (centromere probe for chromosome 13). Lymphocyte spreads and SCLC 
gene in each group. Significant differences were determined by Mann-Whitney tumours without detectable IRS2 amplifications served as negative controls. 
test; *P < 0.05; **P < 0.01. c, d, Focal deletions of the CDKN2A (c) and focal Scale bar, 100 um. 

amplifications of IRS2 (d) were found on chromosome 9 and 13, respectively. 
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Extended Data Figure 6 | TP53 and RB1 alterations in SCLC. a, Distribution 
of somatic mutations in TP53 and RB1 according to the colour panel provided. 
b, c, Complex genomic rearrangements in RB1 showing homozygous 

deletions of exon 1 (b) or inversions within the RB1 gene (c). d, e, Annotated 
silent or missense mutations in RB1 occur at intron-exon junctions resulting in 


alternative splicing, intron retention (d) or exon skipping events (e). The 
coverage at the respective exon junctions is quantified as RPKM values. Sample 
$02194 is not holding any mutations at intron-exon junctions and is displayed 
as an example for unaltered splicing of RBI. 
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Extended Data Figure 7 | Chromothripsis in human SCLC. a, Circos plot of _ highlighted. b, Circos plots displaying fusion transcripts identified in the SCLC 


the chromothripsis sample $02353 showing intra- and interchromosomal chromothripsis cases (Supplementary Table 12) are represented as blue 
rearrangements between chromosome 3 and 11. The integral copy number (S02297) or red (S02353) lines for genes located on chromosome 3 and 11. 
state (iCN) is plotted as a heatmap and assigned to the respective chromosomal __c, Immunohistochemistry staining for p53, p14 (ARF) and p16 on FFPE 
regions. The chromosomal context of CCND1 (on chromosome 11) is material of the chromothripsis sample 02297. Original magnification, < 400. 
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Extended Data Figure 8 | Recurrent genomic translocations in SCLC. 

a, Recurrent genomic translocations (n = 14) affecting chromosome 22 are 
illustrated as a Circos plot highlighting the respective rearrangements as red 
connecting lines. b, Breakpoints in chromosome 22 map to intron 1 of TTC28 
and cluster downstream of the LINE1 (L1Hs) retrotransposon. Each arrow 
indicates the sample and the respective chromosomal position the segment 
translocates to. c, Schematic representation of the TP73 locus (hg19) describing 
complex intrachromosomal rearrangements of TP73 identified for S02397 and 
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$02243. Recurrent somatic mutations identified in Fig. 1a are mapped to the 
respective exons. d, Validation of somatic TP73 translocations. Genomic 
regions involved in the TP73 rearrangements were amplified in matched 
normal (N) and tumour (T) samples. The expected band size is indicated in 
brackets. The respective PCR products were subjected to Sanger sequencing to 
confirm the genomic breakpoint. e, Copy-number state of the TP73 gene in 
samples involved in genomic translocations. 
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Extended Data Figure 10 | Notch is a tumour suppressor in SCLC 
regulating neuroendocrine differentiation. a, Somatic mutations identified 
in NOTCH3 and NOTCH4 are mapped to the protein domains. Damaging 
mutations are highlighted in red. Mutations found in murine SCLC tumours 
are highlighted in blue. b, Quantification of tumour lesions and per cent 
tumour area to lung in TKO (n = 5) and TKO;N1ICD (n = 4) mice 3 months 
after Ad-Cre instillation. Statistical significance was determined by two-tailed 
unpaired Student’s t-test. c, Representative immunohistochemistry for GFP 
or tdTomato in lungs from TKO;Rosa26"""”"° mice approximately 6 months 
after tumour induction. Left scale bar, 500 lm; right and middle: scale bar, 
50 um. d, Representative immunostaining for Notch2 in lungs from 
TKO;Rosa26\”""? mice approximately 6 months after tumour induction. Left 
scale bar, 500 im; right scale bar, 50 jum. e, Quantification of the per cent 
recombination at the Rosa26 locus in TKO;Rosa26""""* (n = 6) and 
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viability assay of the human SCLC cell line NJH29 transfected with a NIICD 
(Notch1) expression plasmid or empty vector control (Ctrl) (3 independent 
biological replicas with 3 technical replicas each). Fold growth was normalized 
to day 0; representative images were taken on day 6. Scale bar, 50 jim. 

g, Immunohistochemistry staining in FFPE embedded tissues of TKO and 
TKO;N2ICD mice. Scale bar, 50 um. h, Quantitative RT-PCR validation of 
Notch1 induction and the expression of common Notch target genes after 
N1ICD transfection in murine SCLC cells (three biological replicas; two-tailed 
paired Student’s t-test). i, Mouse SCLC cells transfected with control or N1ICD 
(Notch1) were analysed 48 h later by gene expression microarrays. Gene 

Set Enrichment Analysis (GSEA) was performed on these data; selected 
significant gene sets are displayed. j, k, EdU analysis of mouse (j) and human 
(k) SCLC cells (three independent biological replicas with three technical 
replicas each; two-tailed paired Student’s t-test). *P < 0.05; **P < 0.01; 

***P < (0.001. Data are represented as mean + s.d. 
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HipBA-promoter structures reveal the 
basis of heritable multidrug tolerance 


Maria A. Schumacher!*, Pooja Balani**, Jungki Min!, Naga Babu Chinnam!, Sonja Hansen7+, Marin Vulic?+, 


Kim Lewis? & Richard G. Brennan! 


Multidrug tolerance is largely responsible for chronic infections and caused by a small population of dormant cells called 
persisters. Selection for survival in the presence of antibiotics produced the first genetic link to multidrug tolerance: 
a mutant in the Escherichia coli hipA locus. HipA encodes a serine-protein kinase, the multidrug tolerance activity of 
which is neutralized by binding to the transcriptional regulator HipB and hipBA promoter. The physiological role of HipA in 
multidrug tolerance, however, has been unclear. Here we show that wild-type HipA contributes to persister formation and 
that high-persister hipA mutants cause multidrug tolerance in urinary tract infections. Perplexingly, high-persister 
mutations map to the N-subdomain-1 of HipA far from its active site. Structures of higher-order HipA-HipB-promoter 
complexes reveal HipA forms dimers in these assemblies via N-subdomain-1 interactions that occlude their active sites. 
High-persistence mutations, therefore, diminish HipA-HipA dimerization, thereby unleashing HipA to effect multidrug 
tolerance. Thus, our studies reveal the mechanistic basis of heritable, clinically relevant antibiotic tolerance. 


Bacterial multidrug tolerance (MDT) is largely responsible for 
the inability of antibiotics to eradicate infections and is caused by a 
subpopulation of phenotypic variants called persisters'. Because bac- 
tericidal antibiotics target processes in metabolically active cells, pers- 
isters, which are dormant, survive”. Persisters that resume growth 
after antibiotic removal lead to recurrent infections, especially those 
caused by biofilms'*°. Hence, MDT represents a significant threat to 
human health. However, elucidation of the mechanisms that drive 
MDT has been hampered by the rarity of persisters; typically only 
one in 10° cells becomes a persister. The first persister locus, hipA7, 
was identified in E. coli three decades ago®. The hipA7 locus, which 
leads to a 1,000-fold increase in persisters, contains two mutations, 
G22S and D291A, in the HipA protein’. HipA is a 440-residue 
protein that is co-transcribed with the 88-residue helix-turn-helix 
containing HipB DNA-binding protein®*°. HipB forms a complex 
with HipA and promoter DNA, neutralizing the MDT activity of 
HipA. Thus, HipA and HipB form a toxin-antitoxin module'*’. 

Recent data revealed HipA is a serine-protein kinase’’""* that phos- 
phorylates glutamyl-transfer RNA synthase’*"*, inhibiting protein 
synthesis and driving cells into dormancy. Formation of the HipA- 
HipB-hipBA promoter complex, which consists of several HipA and 
HipB molecules bound to multiple operator sites, maintains HipA in 
an inactive state and mediates transcription autorepression®”””. 
However, the mechanisms involved in these inhibitory activities are 
not understood. Mutants in hipA have been used as a convenient 
model to study persisters'”"°, but it has been unclear whether the 
kinase plays a role in MDT in vivo. Here we show that wild-type 
(WT) HipA contributes to persister formation, and that hipA7 
high-persister mutants are found in patients with urinary tract infec- 
tions (UTIs). The molecular basis for the high-persister phenotypes 
was revealed by structures of higher-order HipA—HipB-hipBA pro- 
moter complexes, emphasizing the importance of structural studies 
on fully assembled transcription promoter complexes in elucidating 
complex biological regulatory mechanisms. 


HipA contributes to antibiotic tolerance 


E. coli HipA was the first bona fide MDT-inducing protein identified’. 
Ectopic overexpression of the native HipA protein increases persister 
production. However, deletion of the hipA gene produces no pheno- 
type’, either because of redundant mechanisms of persister formation 
or because HipA does not play a role in antibiotic tolerance’?”'. 
However, if only a small fraction of cells expressing HipA become 
persisters, this would be overlooked in studies on bulk populations. 
We therefore examined the correlation between hipBA expression 
and persister formation in single cells using a plasmid-encoded 
promoter) ;)g4—green fluorescent protein (GFP) fusion, which reports 
chromosomal hipBA expression. Fluorescence-activated cell sorting 
(FACS) of this population showed a random distribution, and dim, 
middle and bright cells were collected and exposed to ofloxacin 
(Fig. 1a). Survival was considerably better in bright cells and depended 
on the presence of the chromosomal hipBA locus, suggesting that 
stochastic expression of the hipBA operon indeed causes persister 
formation. Analysis of the hipA7 mutant showed a similar pattern 
but shifted to higher levels of expression (Fig. 1a). These results dem- 
onstrate that hipA contributes to persister formation in WT E. coli, 
and that the hipA7 mutation somehow leads to elevated expression of 
the hipBA operon. 


HipA causes MDT UTIs 


The hipA7 mutant was originally obtained in the Moyed laboratory 
by selecting for increased antibiotic tolerance and has been widely 
used as a model for studying persisters®. To obtain additional higher- 
persister HipA mutants, we performed a similar selection for 
increased tolerance to a combination of ampicillin and cefotaxim. 
To enrich for high-persister mutants, surviving cells were cultured 
and challenged again with antibiotics. Whole-genome sequencing 
identified an additional hipA allele, P86L, which displayed a high- 
persister phenotype similar to hipA7 (Fig. 1b and Extended Data 
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Figure 1 | Expression of hipA leads to persister formation. a, GFP 
expression levels of plasmid carrying promoter hipBA-GFP construct in WT, 
hipA7 and AhipBA strains. One hundred thousand FACS-sorted cells were 
collected representing dim, middle and bright populations and survival to 
ofloxacin was measured. b, Exponentially growing cultures were treated 

with ampicillin (left panel) or ciprofloxacin (right panel). Samples were 
taken at indicated times and surviving bacteria were determined by colony 
count. Values in a and b are an average of at least three biological replicates; 
error bars, s.d. 


Fig. 1). Given the dramatically improved survival of these mutants in 
the presence of antibiotics, we reasoned that such strains might be 
selected for not only in the laboratory, but also in nature. 

UTIs are the most common chronic infections in humans and are 
caused primarily by E. coli”””’. It has been suggested that the recurrent 
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nature of such UTI infections may be driven by drug tolerance”. 
Indeed, dormant E. coli cells tolerant to antibiotics are present within 
the bladder epithelium. However, the mechanism of their tolerance is 
unknown. We therefore screened a library of 477 E. coli isolates, both 
commensal and from patients with UTIs, in search of hipA mutations. 
Strikingly, sequencing of the hipA genes revealed 23 hipA7 mutants 
(both G22S and D291A substitutions were present in these mutants) 
and a hipA(P86L) mutant (Supplementary Table 1). Deletion of the 
hipA7 allele in a UTI isolate caused a sharp decline in antibiotic tol- 
erance, confirming the functionality of this mutation (Fig. 2a). Next, we 
examined the antibiotic tolerance of E. coli hipA7 that had infected 
human bladder cells. A strain deleted in hipA7 showed a sharp decrease 
in the level of persisters surviving treatment with ciprofloxacin, an 
antibiotic routinely used to treat UTIs (Fig. 2b). Thus, these data pro- 
vide evidence that hipA mutations, including hipA7, are important 
players in clinically relevant E. coli MDT infections. 


Mapping HipA high-persister mutants 

In addition to hipA(P86L) and hipA7 (hipA(G22S-D291A)), 
hipA(D88N), which was isolated in an earlier laboratory screen’, also 
leads to a high-persister phenotype. For hipA7, MDT is conferred by 
the G22S mutation, while D291A has a dampening effect (M.V., 
unpublished observations). Notably, each of these high-persister 
mutations map to HipA N-subdomain-1, which contains a unique 
fold (Fig. 2c)''. In the structure of HipA-HipB bound to a single 
operator, Gly22, Pro86 and Asp88 are far from the HipA active site 
and HipB interacting surface (Fig. 2c)'’. However, data suggest that 
when HipA is assembled with HipB on the hipBA promoter, which 
contains multiple operators, its MDT activity is inhibited”'® by some 
unknown mechanism(s). 


Characterization of hipBA promoters 


The hipBA system appears widespread among Gram-negative species. 
However, only the E. coli hipBA promoter has been characterized. 
This promoter contains four operators bound by HipB with the 
consensus TATCCNsGGATA*™. To identify additional hipBA pro- 
moters, we searched the genomes of multiple bacteria (see Methods). 
These analyses revealed hipBA promoters and hipB and hipA genes in 
numerous Gram-negative bacteria. Intriguingly, all promoters con- 
tained two to four operators that are separated by 10 base pairs (bp), 
except for the O2 and O3 operators in E. coli and Shigella sonnei, 
which are linked by a 21 or 22 bp spacer (Fig. 3a). To deduce the 


Figure 2 | Mutants in hipA7 are found among 
pathogenic and commensal strains of E. coli 
and cause a high-persister phenotype in clinical 
isolates and bladder cells. a, The hipA7 allele 
confers a high-persister phenotype to an 
exponentially growing clinical isolate of E. coli 
treated with ciprofloxacin. b, E. coli infecting HTB-9 
human bladder cells. Strains (W226 hipA7, W226 
AhipA and W226 WT) were treated with 
ciprofloxacin (cipro). *P < 0.05; NS, non- 
significant; c.f.u., colony-forming units. c, HipA 
high-persister mutation sites, G22S, P86L and 
D88N, localize to the N-subdomain-1, far from the 
active site, DNA and HipB dimer. Values in a and 
b are an average of at least three biological 
replicates; error bars, s.d. 
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Figure 3 | Structures of HipB-(01-O2) and HipA-HipB-(O1-02) 
complexes. a, The hipBA promoter organization in various bacteria. Notable 
features are the conservation of the operator sequence and 10 bp spacing 
between operators (indicated) except E. coli and S. sonnei, which contain 21 
or 22 bp O2-O3 linkers. b, The HipA-HipB-DNA structure (right) is shown 
in the same orientation as the HipB-(O1-O2) complex (left) with HipA 
molecules (red and pink) shown as transparent surfaces. Both structures 
adopt the same extended conformation. 


molecular mechanisms of hipBA autorepression and higher-order 
promoter assembly, we next determined structures of HipB and the 
HipA-HipB complex bound to the ‘minimal’ hipBA promoter com- 
posed of the O1-O2 operators. 

Structures of HipB bound to either a 48 or a 50 bp hipBA promoter 
were obtained to 3.35 and 3.50 A resolution, respectively (Extended 
Data Table 1, Extended Data Fig. 2a and Fig. 3b). Specific binding by 
HipB dimers induces ~70° bends in consecutive operators as well as 
significant DNA deformations in the 10 bp linker, which results in 
the HipB dimers being positioned on opposite faces of the DNA 
(Fig. 3b). Thus, unlike previous suggestions that HipB binding to 
multiple hipBA operators might lead to DNA wrapping”, both 
HipB-(O1-O2) structures reveal that, although the DNA is signifi- 
cantly distorted, the HipB-(O1-O2) complexes are extended 
(Extended Data Fig. 3a). That this structure is also formed in solution 
is supported by atomic force microscopy (AFM) experiments, which 
show HipB dimer pairs as single irregular spheres on closely spaced 
O1-O2 and 03-04 sites on an extended DNA template (Extended 
Data Fig. 3b-d). 


Structure of the HipBA-promoter complex 


A HipA-HipB-(O1-O2) structure was next obtained to 3.77 A reso- 
lution (Extended Data Table 2 and Extended Data Fig. 2b) and 
revealed the same extended conformation as the HipB-(O1-O2) 
complexes, with each HipA contacting the side of one HipB dimer 
(Figs 3b and 4a, b). As noted, it was unclear how HipA kinase activity 
is maintained in an inactive state in the promoter complex as the 
HipA active sites are exposed in the structure of HipA-HipB bound 
to a single operator’'. The HipA-HipB-(O1-O2) structure reveals 
why complex formation and, more pointedly, why the extended 
conformation of the complex are essential to HipA inhibition. 
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Specifically, the HipB-induced bends in the O1 and O2 operators 
juxtapose the normally monomeric HipA molecules, allowing them 
to dimerize. Strikingly, HipA dimerization, which buries ~ 1,000 A? of 
surface area, blocks the active sites of each HipA molecule (Fig. 4a, b). 
Further, HipA dimerization positions the active sites in proximity, 
probably precluding the extrusion of the HipA P-loop, which can be 
trans-autophosphorylated, causing HipA inactivation (Extended 
Data Fig. 4)'*. While it is possible that structural changes in the 
P-loops and adjacent activation regions might allow them to access 
the extruded state, this seems unlikely, and preventing inadvertent 
P-loop ejection and phosphorylation would keep HipA primed for 
catalysis upon promoter release. 


Mechanism for high-persister phenotype 


As previously noted", Asp291 is located close to the HipB-HipA 
interface (Fig. 4b) but the roles of the HipA higher-persister 
mutations located on N-subdomain-1 were unclear (Fig. 2c). The 
HipA-HipB-(O1-O2) structure provides an explanation for these 
high-persister mutations as it reveals that N-subdomain-1 constitutes 
the majority of the HipA dimerization interface in the complex. 
N-subdomain-1 regions involved in dimerization include residues 
20-25 and 53-90, both of which encompass the high-persister muta- 
tions. C-domain residues 263 and 270 provide the only additional 
dimer contacts. Although the detailed locations of the atoms in the 
HipA side chains cannot be discerned at this resolution, Gly22, Pro86 
and Asp88 are positioned precisely at the dimer interface in the 
HipA-HipB-(O1-O2) structure. In high-resolution HipA structures, 
Asp88 and Pro86 participate in helical capping and hence stabiliza- 
tion of the relatively short «4 helix, which is important in HipA dimer 
contacts (Extended Data Fig. 5a, b and Fig. 4c). The two Gly22 resi- 
dues directly abut and form the nexus of the dimer (Extended Data 
Fig. 5b). Modelling shows that substitution of this glycine to any other 
residue would lead to steric clash between subunits in the dimer 
(Fig. 4c). 

The HipA-HipB-(O1-O2) structure also explains previous DNase I 
footprinting data showing that HipA makes multiple DNA contacts’. 
In the HipA-HipB-(O1-O2) structure, the DNA-facing surface of the 
HipA dimer is electropositive, and residues Lys3, Lys27, Arg49, Asn51 
and Thr53 from HipA N-subdomain-1 and Lys379 and Arg382 from 
the C-domain are positioned to make DNA phosphate contacts 
(Extended Data Fig. 6). Notably, in the HipA-HipB-(O1-O2) structure, 
although the crystallization mixture contained a 2:2 HipB:HipA ratio, 
HipA molecules are only bound to HipB subunits facing the 10 bp 
spacer. This binding arrangement not only allows HipA-HipA dimer- 
ization, but also HipA-DNA contacts and hence would be predicted to 
be energetically favoured over binding to the outside-facing HipB 
subunits. Regardless, previous studies showed HipA is capable of 
binding HipB with a 2:2 ratio'’. It is possible that HipA molecules 
bound at the outer edges of promoter regions that do not dimerize 
could be active. To this point, however, studies indicate that HipB and 
HipA expression in cells is extremely low; it has been estimated that 
the hipA and hipB genes, which contain an unusual number of non- 
optimal codons and are under the control of a non-optimal promoter, 
may be transcribed only ~0.25 times per generation’. Expression 
analyses also suggest that HipB may be present at higher concentra- 
tions than HipA’. Thus, while the ratio of HipA to HipB and their 
intracellular concentrations, free and DNA-bound, are not known, 
multiple mechanisms appear to have been selected to ensure low 
protein levels, in particular of HipA, including possibly insufficient 
amounts of HipA to occupy the outside-facing HipB subunit. 

Combined, our structural data suggest a molecular explanation for 
the hipA7 high-persister phenotype identified decades ago® as they 
predict that mutation of residues in the HipA-HipA interface, par- 
ticularly the G22S substitution, would impair this dimerization, 
thereby disfavouring HipA-HipB-(O1-O2) complex formation. 
To test this structural hypothesis, we examined the HipA-HipA 
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Figure 4 | Dimerization-mediated inhibition of 
HipA and structural mechanism underlying the 
high-persistence phenotype of N-subdomain-1 
mutations. a, HipA—HipB-(O1-O2) structure. 
The HipA active sites, denoted by ATP molecules, 
which are not present in the structure, are occluded 
by HipA dimerization. b, Locations of HipA7 
high-persistence mutations in the HipA—HipB- 
(O1-O2) complex. Gly22 and Asp291 are shown 
as green spheres. c, Close up of HipA dimer 
interface. Gly22 residues (spheres) are positioned 
precisely at the dimer centre. High-persistence 
mutations, D88N and P86L, shown as sticks, also 
localize to this interface. d, Representative binding 
isotherms of WT HipA (red plot), HipA(G22S) 
(blue plot), HipA(D88N) (green plot) and 
HipA(P86L) (black plot) and the HipB-(O1-O2) 
complex. Values in this figure are averages of 
three experiments. 
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interaction in the context of a fluorescently labelled O1-O2 DNA 
(F-O1-O2) complexed with HipB (Methods). Wild-type HipA 
bound the HipB-(F-O1-O2) complex with a dissociation constant 
(Ka) of 50 + 5.7nM, while the mutants displayed reduced binding, 
with Kg values of 374 + 40nM, 280 + 80nM and 87 + 6.0nM for 
HipA(G22S), HipA(P86L) and HipA(D88N), respectively (Fig. 4d). 
That HipA(D88N) was not as affected in its binding is consistent with 
previous data showing this mutation causes a less severe persistence 
phenotype’. Thus, these data support the structural mechanism for 
the high-persister phenotype and explain the increased HipA protein 
expression in hipA7 strains (Fig. 1a). Simply, there is less HipA7 
bound at the promoter and HipA is required for full repression of 
the hipBA operon”””. 


Models for HipBA-promoter complexes 


The HipA-HipB-(O1-O2) structure reveals that the conserved 10 bp 
linker between operators provides appropriate spacing for HipA 
dimerization. However, in the E. coli and S. sonnei hipBA promoters, 
O2 and O3 are separated by 21 and 22 bp, respectively, which mod- 
elling indicates is too long to allow HipA-HipA contacts. To gain 
insight into HipA-HipB organization on the O2-O3 region of the 
promoter, we determined the structure of a HipA-HipB-(O2-03) 
complex to 3.99 A. Remarkably, the structure showed that the 21 bp 
DNA linker is disordered and apparently extruded, thereby allowing 
the HipA molecules to form the same dimer as observed in the 01-O2 
complex (Extended Data Fig. 7a, b). Thus, while ~10 bp may be an 
energetically optimal inter-operator spacing to allow HipA-HipA 
contacts, the addition of an extra turn of DNA helix does not preclude 
the formation of HipA dimers. Combining this result with the HipA- 
HipB-(O1-O2) structure allows us to generate models for higher- 
order bacterial HipA-HipB-hipBA promoter complexes (Fig. 5). 


The hipBA transcription repression mechanism 


In addition to controlling the MDT activity of HipA, HipA-HipB- 
promoter complex formation also mediates transcription autorepres- 
sion’. Bacterial RNA polymerase (RNAP) is recruited to promoters 
via specific interactions between its o-factor and the —35 and 
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—10 boxes****. In the E. coli hipBA promoter, these elements are 
located at the 5’ and 3’ ends of 02"°. The HipB-(O1-O2) structure 
includes the — 10 and most of the —35 element. Thus, to gain insight 
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Figure 5 | Structures of fully assembled HipA-HipB-hipBA promoter 
complexes. Left: schematic organizations of hipBA promoters ascertained 
by bioinformatic analyses. Right: structure of the HipA-HipB-(O1-O2) 
complex (top) and deduced structures of the HipA-HipB-(O1-O2-03) 
(middle) and HipA-HipB-(O1-O2-03-O4) (bottom) complexes. 
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Figure 6 | Molecular mechanisms of HipB and 
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structures (left). O2 encompasses the —35 and 
—10 boxes of the hipBA promoter and is 
dramatically distorted by HipB-HipA binding, 
consequently reconfiguring the major and minor 
groove positions to preclude o-factor binding 
(right). b, Superimposition of the —35 and —10 
elements of the HipA-HipB-(O1-O2) and RNAP 
holoenzme-DNA complexes (1L9Z)”° reveals 
HipB-RNAP steric clash and reorientation of 
downstream DNA. c, Superimposition of 

the —35 and —10 elements of the modelled 
HipA-HipB-(O1-O2-O3) and RNAP 
holoenzme-DNA complexes reveals additional 
HipA-o clash. 
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into the mechanism(s) of HipB and HipA-HipB autorepression, 
we analysed the DNA conformation surrounding O2 and overlaid 
HipA-HipB-promoter structures onto the RNAP holoenzyme- 
DNA structure. These analyses show that HipB-induced DNA dis- 
tortions in O2 repositions the —35 and — 10 boxes to lie on opposite 
faces of the DNA, probably precluding productive RNAP binding 
(Fig. 6a). Further, these superimpositions demonstrated that HipB, 
and toa lesser degree HipA, would sterically block RNAP access to the 
—35 box (Fig. 6b). HipA-HipB-mediated DNA distortion also bends 
the DNA downstream of — 10 opposite to the direction used by RNAP 
to initiate transcription. Finally, superimposition of the HipA—HipB- 
(O1-O2-O03) complex onto the RNAP holoenzyme-DNA complex 
shows that the HipA molecule bound between O2 and O3 would 
sterically impede RNAP binding (Fig. 6c). 

Our analyses indicate that HipB and HipA-HipB promoter- 
binding present a strong physical roadblock for RNAP binding and 
misalign key promoter elements. These findings suggest that the posi- 
tions of the —35 and —10 boxes in the E. coli hipBA promoter have 
evolved to ensure highly efficient repression and hence might be 
similarly organized in other hipBA promoters. Indeed, we found that 
hipBA promoters in which the —35 and —10 elements could be read- 
ily predicted harbour the same relative positioning as found in the 
E. coli promoter (Extended Data Fig. 8), suggesting a conserved hipBA 
autorepression mechanism across Gram-negative bacteria. 


Discussion 


Persisters are the main culprits responsible for recalcitrance of 
chronic infections to antibiotic therapy’*°’?. RNA endonuclease 
toxin-antitoxin modules, which use mRNA-degrading RNases to 
inhibit translation, have been shown to be expressed in persisters***" 
and deleting ten of them sharply decreases antibiotic tolerance’. We 
show here by single-cell analysis that tolerance increases in a small 
population of E. coli stochastically overexpressing HipA alone. 
Moreover, this kinase becomes the dominant factor determining tol- 
erance in hipA7 mutants, where it accounts for 99.9% of all persisters. 
We also show that hipA7 mutants are present in E. coli isolates 
from patients with UTIs, providing a direct link with this particular 
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mechanism of persister formation and the clinical manifestation of 
disease. Target modification is the most common mechanism by 
which bacteria acquire resistance to antibiotics. The UTI hipA7 E. coli 
described in this study provides a precedent to a parallel evolutionary 
mechanism imparting increased tolerance to antibiotics. 

Mutations in hipA that lead to a high-persister phenotype and UTIs 
localize to HipA N-subdomain-1, a surface exposed region distal to 
the HipA kinase active site and HipB binding region. Our structural 
studies reveal the importance of this region as they show that when 
HipA forms higher-order promoter complexes with HipB and mul- 
tiple operators in hipBA promoters, the kinase forms dimers via inter- 
actions between operator-adjacent N-subdomains-1. Critically, the 
formation of these HipA dimers blocks their active sites. Hence, muta- 
tions in this HipA-HipA interface would liberate HipA from its inact- 
ive state, leading to increased persistence. These structures also 
explain the basis for hipBA autorepression, as HipB-HipA binding 
to the promoter distorts critical promoter elements required for pro- 
ductive RNAP binding and functions as a roadblock to efficient tran- 
scription. Finally, in general, the structural mechanisms by which 
promoters containing multiple DNA-binding elements are regulated 
are poorly understood, as structures have largely been determined 
for individual transcription-factor-DNA complexes. Our studies 
emphasize the importance of visualizing higher-order assemblies to 
understand not only the mechanisms of transcription regulation fully, 
but also how the functions of the DNA-bound regulatory proteins 
themselves and, in turn, cellular physiology are impacted by assembly 
formation. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Bacterial strain construction. The hipAP86L and hipA7 mutants in the WT 
(MG1655) background were constructed using the P1 transduction method”. 
The deletion mutant of hipA in uropathogenic E. coli W226 was constructed using 
a modified lambda Red recombination method”. A kan/parE cassette under the 
control of a rhamnose promoter was used to replace hipA**. Colonies were 
selected on Luria Bertani agar (LBA) + kanamycin plates and mutations con- 
firmed by lack of growth on minimal-MOPS agar plates containing 0.5% rham- 
nose as a sole carbon source. For replacement of the kan/parE cassette with a WT 
hipA, the allele was amplified from WT MG1655 strain and inserted into target 
cells by transformation. Colonies that successfully replaced the cassette with the 
new hipA allele were selected on minimal-MOPS rhamnose agar plates. The 
mutants used for sorting were constructed by inserting a reporter plasmid 
(pUA66) obtained from the E. coli promoter library, which had a hipA promoter 
transcriptionally fused to GFP**. The plasmids were transformed into a WT 
(MG1655) and hipA7 strain. The representative E. coli commensal and UTI 
clinical strains were obtained from H. Ochman, R. K. Selander and the ECOR 
reference collection***’. The UTI strains were obtained from A. E. Stapleton and 
J. R. Johnson. 

Mutagenesis of WT E. coli to obtain high-persister mutants. An E. coli K12 
MG1655 WT strain was used for the selection of in vitro hip mutants via 
a modified version of the original Moyed and Bertrand screen’. Briefly, a WT 
E. coli culture was grown overnight from a single colony. The overnight culture 
was diluted 1:500 in Luria Bertani broth (LBB) and grown for 2 h to an absorbance 
(A600 nm) of 0.2. This culture was diluted 1:50 in LBB and regrown to Agoo nm = 0.2. 
The culture was diluted once again and regrown. The serial dilutions and 
regrowth allowed the removal of pre-existing persisters**”°. After the third 
growth step, cultures were diluted 2:1 in 125 mM HEPES/KOH pH 7 in a 2 ml 
tube and mutagenized with 15 jg ml’ ethyl methanesulfonate (EMS) for 45 min 
at 37 °C**. EMS was removed by centrifugation and the cell pellet was resuspended 
in LBB and grown overnight to allow for segregation. The mutagenized pool was 
plated onto LBA and LBA with 100 ig ml’ rifampicin. Colony-forming units 
were scored after incubation for 24h and frequency of rifampicin-resistant 
mutants was calculated as a measure of efficiency of mutagenesis. 

Enrichment for high-persister mutants from the mutagenized pools. To enrich 
for high-persister mutants, the mutagenized pool was diluted 1:750 in fresh 
medium and grown to mid-exponential phase before challenging with a com- 
bination of 100 pg ml”! ampicillin and 50 jg ml! cefotaxime for 4h. The chal- 
lenge was done with two antibiotics to minimize the selection of resistant 
mutants. The cells were then washed twice with 1% NaCl, resuspended in fresh 
medium and grown overnight. This cycle of antibiotic treatment and regrowth 
was repeated two more times and the persister fractions were measured at the end 
of each cycle by plating the dilutions onto LBA and scoring the colony-forming 
units. After the final round of enrichment, 100 pl of the mutagenized pools were 
plated onto four LB plates and incubated overnight to allow colonies to grow. 
Twelve clones were selected randomly from well-isolated colonies and streaked 
for purity. Minimal inhibitory concentrations (MICs) of all clones for ampicillin, 
cefotaxime and ofloxacin were established to eliminate any possible resistant 
mutants. In addition, the growth rates of individual clones were tested to elim- 
inate mutants with growth abnormalities. Ten mutants exhibited MICs and 
growth rates similar to the parent WT strain. The hip phenotype of the clones 
was tested in exponential phase with 100,:gml~’ ampicillin, as well as in 
stationary phase with 5,1gml~' ofloxacin. Clones with increased antibiotic 
tolerance in both exponential and stationary phases were sent for whole-genome 
sequencing to identify mutated genes. 

Growth rate and MIC measurement. The growth rates of the selected high- 
persister mutants were measured in LB broth. Overnight cultures were 
diluted 1:1,000 in fresh medium and grown for 6h. Growth was measured by 
plating dilutions onto LB agar and counting colony-forming units per millilitre 
(c.f.u. ml!) every 30 min. To eliminate resistant mutants, the MICs of all selected 
clones were determined for ampicillin, cefotaxime and ofloxacin. The MIC mea- 
surements were made by standard MIC assays according to Clinical and 
Laboratory Standards Institute guidelines. 

Selection of high-persister candidate clones. The high-persister clones for 
whole-genome sequencing were selected on the basis of three criteria: high-pers- 
ister phenotype, WT MICs and WT growth rates. Nine clones that met these 
criteria were selected for whole-genome sequencing. 

Whole-genome sequencing. Whole-genome sequencing was performed on 
DNA extracted from high-persister candidate strains using a Qiagen DNEasy 
DNA extraction kit. To identify the mutations responsible for the high-persister 
phenotype in the selected in vitro hip mutants, whole-genome sequencing of these 
strains was done at the Broad Institute (Cambridge, Massachusetts, USA) using 
Solexa sequencing technology. The genomes of the sequenced hip mutants were 
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compared with the WT parent strain to determine the SNPs generated by EMS 
mutagenesis. 

Persister assays. Persister assays for exponentially growing cultures were done by 
diluting overnight cultures 1:100 in fresh medium and then growing the cultures 
to late exponential phase (~5 X 10” c.f.u. ml‘). An aliquot of the culture was 
taken before the addition of antibiotic to measure the initial colony-forming unit 
counts. Antibiotic was then added and aliquots were removed at 1, 3 and 6h time 
points, washed with 1% NaCl, serially diluted and plated onto LBA or MacConkey 
agar, as indicated. 

Sorting persisters expressing GFP using FACS. Bacterial cultures containing 
the reporter plasmid were grown overnight from freezer stocks. The overnight 
cultures were further diluted 1:100 in fresh medium containing kanamycin and 
grown to late exponential phase. At this stage, cells were washed twice with sterile 
PBS and run through the FACS Aria II cell sorter. Individual cells were sorted on 
the basis of their GFP expression levels. Total cell population GFP levels were 
measured by analysing 10,000 cells per sample. The cell populations were divided 
into three fractions based on their intensity of fluorescence (dim, middle and 
bright) and sorted into separate tubes with each fraction containing 100,000 cells. 
The fractions were then individually resuspended into media containing 5 ug ml 
ofloxacin for 3 h and the dilutions were plated onto LBA to count colony-forming 
units per millilitre. 

E. coli infection of bladder cell cultures. HTB-9 human bladder cells (American 
Type Culture Collection (ATCC) 5637) were obtained from ATCC and grown to 
80% confluence in 24-well plates in RPMI1640 supplemented with 10% fetal 
bovine serum, 2mM t-glutamine, 10mM HEPES, 1mM sodium pyruvate, 
4,500 mg 1~' glucose, and 1,500 mg 1”! sodium bicarbonate, and incubated at 
37 °C in a 5% CO, atmosphere. Cells were examined daily under the microscope 
to check for contamination and healthy growth. E. coli cultures for infection were 
grown in LBB for 24h at 37 °C in static conditions to induce type 1 pilus forma- 
tion, which is necessary for cell attachment. Once the bladder cell cultures 
reached the desired confluency, the cells were infected with the E. coli cultures 
as described in ref. 40. Infected cells were then washed twice with PBS2+, and 
fresh media containing either 2 1g ml * ciprofloxacin or 10 1g ml ' gentamycin 
(control) was added to the appropriate wells to determine the persister fraction in 
the infected cells. After 6 h of incubation, the bladder cells were lysed with 1 ml of 
0.4% Triton X-100 in PBS. All steps subsequent to lysis were performed on ice, to 
prevent loss of bacteria. The lysates were diluted and 10 ul were plated in tripli- 
cates on MacConkey agar to count survivors. Five hundred microlitres of the 
undiluted lysates were plated onto MacConkey agar plates to increase the limit 
of detection. 

Statistical analysis. No statistical methods were used to predetermine sample 
size. The experiments were not randomized. The investigators were not blinded to 
allocation during experiments and outcome assessment. 

The statistical significance for bladder cell experiments was calculated using 
unpaired two-tailed t-tests in GraphPad Prism 6 software. P values below 0.05 
were considered significant. 

Protein expression, purification and crystallization of HipB-(O1-O2), HipA- 
HipB-(O1-O2) and HipA-HipB-(O2-O3) complexes. HipB was purified in a 
single step via Ni-NTA chromatography. In all crystallization experiments, the 
His-tag was removed by thrombin cleavage. HipA(D309Q), in which the catalytic 
base, D309, was mutated to glutamine rendering the protein inactive, was 
expressed and purified as previously described''’. HipA(D309Q) was used in all 
structural studies and hence will be referred to as HipA. To obtain HipB-DNA 
crystals, HipB was concentrated to 20 mg ml~’ and mixed at a stoichiometry of 
one DNA duplex to two HipB dimers. Two crystal forms of HipB-(O1-O2) 
complexes were obtained. The P2,2,2 form was grown using an O1-O2 DNA 
duplex with the sequence 5’-TTATCCGCTTAAGGGGATATTATAAGTTT 
TATCCTTTAGTGAGGATAA-3’. Crystals were produced by mixing the 
HipB-DNA solution, 1:1, with 38% MPD and 0.1 M sodium acetate pH 4.6. 
The resultant crystals were cryo-preserved straight from the drop for data col- 
lection. A C2 crystal form was obtained by combining HipB with a DNA contain- 
ing an O1 site with a self-complementary 10 bp overhang designed to generate a 
50 bp symmetric O1-O1 site (a pseudo-O1-O2 site). The DNA was mixed 1:1 
with 20 mg ml of HipB dimer and crystallized using 27% PEG 8000, 0.1 M MES 
pH 6.5. The crystals were cryo-preserved using the crystallization solution sup- 
plemented with 20% glycerol. Crystals of the HipA-HipB-(O1-O2) complex 
were obtained by using the self-complementary O1 site, which generated the 
50 bp symmetric operator duplex. To crystallize the HipA-HipB-DNA complex, 
HipA, HipB dimer and DNA duplex were mixed 2:1:1 and added 1:1 (v:v) to4 M 
sodium formate. The crystals took the tetragonal, space group P432)2. A 57-base 
oligonucleotide encompassing the hipBA O2-O3 site (5’-TATCCCGTAGAG 
CGGATAAGATGTGTTTCCAGATTGACTTATCCTCACTAAAGGATA-3’) 
was used to crystallize the HipA-HipB-(O2-O3) complex. To form the ternary 
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complex, HipA, HipB dimer and DNA duplex were mixed 4:2:1 and the solution 
combined with 1.3 M ammonium sulfate, 0.1 M citrate pH 5.6. The crystals were 
cryo-preserved by dipping them in the crystallization solution supplemented with 
15% glycerol for several seconds before placement in the cryo-stream. All X-ray 
intensity data were collected at the Advanced Light Source beamline 8.3.1 and 
processed with MOSFLM. 
Structure determinations of the HipB-(O1-O2), HipA-HipB-(O1-O2) and 
HipA-HipB-(O2-03) complexes. The HipB-(O1-O2) complex structure was 
determined using the previously determined HipA-HipB-21-base oligonucleo- 
tide DNA structure”’, as a search model after removing HipA. There isa full 48 bp 
O1-O2 fragment and two HipB dimers in the asymmetric unit. Molecular 
replacement using Phaser produced two solutions and, after refinement, electron 
density for the DNA connecting the O1 and O2 operators was revealed and 
modelled. The structure was refined using CNS and Phenix*’. The HipB com- 
plex with the two-fold related O1 sites was also solved using the HipB dimer-O1 
complex. One solution was obtained and crystal symmetry generated the 
extended complex. After refinement, density for the connecting DNA was evident 
and modelled. However, the value of Rgee remained high (> 39%). After 
several rounds of refitting and refinement, clear electron density was observed 
for a non-specifically bound HipB dimer in which one HipB subunit makes 
semi-specific contacts while the other makes no base contacts. Notably, this 
non-specific complex may represent a snapshot of HipB interaction with DNA 
before forming a specific complex with the operator and concomitant bending. 
Addition of the extra HipB dimer led to convergence of refinement to a value of 
Réree = 28.9%. The HipB-(O1-O2) form 1 crystal structure refined to final value 
of Rwork!Réree = 24.4%/28.5% with 91.1% of residues in the favoured region of the 
Ramachandran plot, while the HipB-(O1-O2) form 2 structure had a final value 
of Rwork/Rfree = 24.8%/28.9% with 94.0% of residues in the most favoured region 
of the Ramachandran plot (Extended Data Table 1). 

Freshly grown crystals of the HipA-HipB-(O1-O2) complex only diffracted to 
10 A. However, crystals in drops that had dried out over time diffracted to beyond 
5 A. Hence, systematic dehydration efforts were initiated and resulted in signifi- 
cantly improved diffraction and the ability to develop a cryo-condition. Data were 
collected to 3.77 A resolution and the structure was solved by molecular replace- 
ment using the HipA-HipB-O1 structure. The structure could only be solved 
after removing one HipA (giving a HipA:HipB:DNA stoichiometry of one HipA 
subunit to one HipB dimer to one DNA duplex). Two solutions were obtained 
and used to construct the final model, which contained two HipA subunits, two 
HipB dimers and two O1 sites with the connecting spacer DNA. The model was 
minimally refined in Phenix after rigid body optimization*' (Extended Data 
Table 2). The HipA-HipB-(O2-O3) structure was solved by molecular replace- 
ment using the HipA-HipB dimer-O1 complex from the HipA-HipB-(O1-O2) 
structure. There were two HipA molecules and one HipB dimer in the asymmet- 
ric unit and half of the 57-bp oligonucleotide DNA. Crystallographic symmetry 
generated the full complex. Remarkably, electron density for the central 21 bp 
operator spacer region was mostly disordered. The structure was minimally 
refined in CNS” (Extended Data Table 2). 
Identification and analyses of hipBA promoter. To locate and map putative 
hipBA operators within the promoters of enteric bacteria, the Regulatory 
Sequence Analysis Tools (RSAT)* program was used. The operator-based 
TATCCNNNNNNNNGGATA sequence pattern was used as the search input. 
Fluorescence polarization binding studies. Fluorescence-polarization-based 
binding assays“ were performed in binding buffer consisting of 150 mM NaCl 
and 25 mM Tris-HCl pH 7.5. To assess the abilities of WT HipA, HipA(G22S), 
HipA(D88N) and HipA(P86L) to bind to the HipB-(O1-O2) complex, WT HipB 
was first titrated into the binding buffer containing 1 nM fluoresceinated O1-O2 
DNA until saturation. Increasing concentrations of HipA protein were then 


added to this complex. The data were plotted and fitted using KaleidaGraph. 
These analyses were conducted in triplicate (technical duplicates) with associated 
errors noted in the text. 

AFM imaging of HipB-DNA complexes. A linear 1,811 bp DNA containing the 
entire 5’ untranslated region and hipBA promoter, referred to as phipBA, was 
amplified using E. coli K12 MG1655 genomic DNA as a template and PCR 
primers (forward: 5’-ATAATAATACTCGAGTCACTTACTACCGTATTCT 
CGGCTTAA-3’; reverse: 5'-ATAATAATACATATGACCGTATAAGCCGC 
ATGTCGAGATGGC-3’). The gel-purified phipBA was subcloned into the 
pET15b vector after digesting with XhoI and Ndel. To obtain large quantities 
of the resulting 500 bp DNA, PCR was performed using the pET15b-phipBA as a 
template and the primers 5'-AAGTTTAGGCATTACCACTCC-3’ (forward) 
and 5'-ACCGTATAAGCCGCATGTCGA-3’ (reverse), respectively. The amp- 
lified 500 bp DNA was gel purified using a Qiaquick gel extraction kit and the 
sequence was verified by DNA sequencing. AFM was performed as described** 
whereby 20 1M HipA and HipB were mixed in a 1:1 ratio in binding buffer 
(20 mM Tris pH 7.5, 200 mM NaCl). The HipA—HipB complex was then mixed 
with the 500 bp DNA at 4:1 or 8:1 ratios and incubated at 23 °C for 10 min before 
AFM analysis. The AFM data were collected after a sample was deposited on 1-(3- 
aminopropyl)silatrane (APS) mica*® and incubated on this surface. Excess sample 
was washed off with de-ionized water and the resultant sample dried with argon 
gas. AFM images in air were acquired using a MultiMode AFM NanoScope IV 
system operating in tapping mode. Regular tapping-mode silicon probes with a 
spring constant of 42 N m7! and a resonant frequency between 300 and 320 kHz 
were used. 
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Extended Data Figure 1 | Validation of the persister phenotype of the 
hipAP86L mutant allele. Comparison of time-dependent survival of isogenic 
strains of E. coli to 100 1g ml! of ampicillin in exponential phase. MV7505 was 
the original EMS strain with the hipAP86L mutation, AhipA and WT hipA in 
the same background (MV7505 AhipA and MV7505 WT), MG1655 (WT) and 
known high-persister mutant (MG1655 hipA7). Overnight cultures of the 
strains were diluted 1:100 in fresh medium and, after 1.5 h of growth, treated 
with 100 pg ml! of ampicillin for 6h. Values are an average of at least three 
individual biological replicates; error bars, s.d. 
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Extended Data Figure 2 | Electron density maps for HipB-(O1-O2) and 
HipA-HipB-(O1-O2) promoter complexes. a, Structure of HipB-(O1-O2) 
complex showing the final refined structure and composite omit map (blue 
mesh), contoured at 0.96 and calculated to 3.35 A resolution. b, Fo — 


omit electron density map (blue mesh) contoured at 3.50 to 3.77 A resolution 
for the HipA-HipB-(O1-O2) complex in which the entire DNA molecule had 
been removed. The protein backbone is shown as white lines. 
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Extended Data Figure 3 | Higher-order HipB-hipBA and HipA-HipB- 
hipBA promoter complexes are extended. a, Comparison of HipB-(O1-O2) 
and HipA-HipB-(O1-O2) structures showing they adopt the same extended 
conformation. b, Schematic of the DNA site used in AFM experiments 
examining HipB interaction with O1-O2-O3-O4. The 5’ and 3’ DNA ends 
have an extra 88 bp and 299 bp, respectively, allowing for their differentiation. 
c, Possible models for HipB-promoter structures. Left: the wrapped model; 
right: the extended model, which has no cross-HipB contacts. In the wrapped 
model, the closely apposed HipB molecules would appear as a single ‘blob’ 


(indicated by dashed green line) bound to fish-hook DNA. In the extended, 
‘beads-on-a-string’ model, the HipB dimers bound to O1-O2 and 03-04 
would be close in space, even if on opposite faces of the DNA; hence, given the 
resolution of AFM, they would appear as single dots on extended DNA. d, AFM 
images of HipB bound to the DNA schematized in b. The right panel shows 
two magnified images. HipB dimers bound to closely spaced O1-O2 or O3-O4 
are observed as single dots (indicated by arrows), consistent with the extended 
model. The longer 3’ end is evident in these images. 
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Extended Data Figure 4 | P-loop ejection in HipA higher-order promoter _ the activation loops. When the pHipA molecules are overlaid onto the 
complexes. The HipA—HipB-(O1-O2) complex is coloured cyan and two promoter-bound HipA dimer, the ejected P-loops and the activation regions 
phosphorylated HipA (pHipA) molecules, in which Ser150 is phosphorylated from neighbouring molecules would clash unless they were to adopt a different 
(shown as spheres) and the loop ejected from the active site pocket, are coloured __ structure or conformation. 

yellow and beige. In pHipA the ejected P-loops stabilize the formation of 
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HipA-HipB-(01-02) 


Extended Data Figure 5 | HipA N-subdomains-1 mediate dimerization in 
the HipA-HipB-promoter complex. a, Comparison of apo HipA with 
promoter complexed HipA showing that the high-persistence hotspot region in 
HipA mediates dimerization. Left: ribbon diagram of apo HipA with 
N-subdomain-1 shown as a red surface, and the location of mutations causing 
high-persistence coloured grey and circled. Right: HipA—HipB-(O1-O2) 
promoter structure. DNA operator sites are coloured yellow, HipB dimers are 
coloured cyan and HipA molecules are coloured red and magenta. The red 
HipA is shown in the same orientation as the apo HipA to the left. The 


N-subdomain-1 high-persistence hotspot region is circled as in the apo 
structure. Note, this region forms the centre of HipA dimerization in the 
higher-order complex. b, Sigma-A-weighted 2F, — F, map showing a close up 
of residues in the HipA interface in the complex. The map is contoured at 0.80 
and calculated to 3.77 A. Left: close up of the locations of Pro86 and Asp88 in 
the dimer interface. Right: location of Gly22 in the structure. The two-fold 
related Gly22 residues directly abut, indicating that any residue other than 
glycine would not allow stable formation of this dimer. 
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HipA-DNA contacts 


Extended Data Figure 6 | HipA contacts to DNA in the HipA-HipB- HipA regions in proximity to DNA are strikingly electropositive. b, HipA- 
promoter complex. a, HipA-HipB-(O1-O2) complex with the DNA shown —HipB-(O1-O2) complex with DNA and HipB dimers shown and coloured as 
as an orange cartoon, HipB dimers as cyan ribbons and HipA molecules as in a. HipA molecules are shown as yellow ribbons, with DNA interacting 
electrostatic surface representations. The blue and red surfaces of HipA residues shown as spheres, coloured blue and labelled for one HipA subunit. 


correspond to electropositive and electronegative regions, respectively. The 
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21 bp connection 


O 
HipA-HipB-(O1-02) 


HipA-HipB-(02-03) 


Extended Data Figure 7 | HipA dimers form in the HipA-HipB-(O2-03) contoured at 1.00 to 3.99 A, shown in the same orientation as the structure 


complex. a, Structure of the HipA-HipB-(O2-O3) complex. The overall below. b, Comparison of the HipA-HipB-(O1-O2) and HipA-HipB-(O2- 
structure is shown at the bottom of the panel, with the two HipA molecules O03) complexes showing that they have identical higher-order structures in 
depicted as red lines and the HipB dimers as green lines. The DNA is shown as_ —_ which the HipA monomers (red and pink) are brought into proximity to form 
sticks. The O02 and O3 operators are connected by a 21 bp linker, which is the same dimer when complexed with HipB dimers and O1-O2 or 02-03 


disordered in the structure. This is illustrated by the 2F, — F. map (blue mesh) _ promoter regions. 
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hipBA systems in Gram-negative bacteria 
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Extended Data Figure 8 | Schematic showing the results of the bioinformatic _ are shown. The consensus HipB binding sequence is shown in red above the O1 
identification of the —35/—10 boxes of hipBA promoters in Gram-negative _ operator site of the E. coli (MG1655) hipBA promoter. The transcription 
bacteria. The predicted promoters, their operator arrangement and the start site (ATG) is also shown for reference. 

locations of the —35/—10 boxes for each bacterial species (labelled to the left) 
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Extended Data Table 1 | Crystallographic statistics: HipB promoter complexes 


Data collection 
Space group 
Cell dimensions 

a, b,c (A) 

a, B,y (°) 
Resolution (A) 
Rsym 
I/ol 
Completeness (%) 
Redundancy 


Refinement 

Resolution (A) 

No. reflections 

work! Reéree (%) 

R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


*Highest resolution shell is shown in parenthesis. 


HipB-(O1-O2)/form1 
P2,2)2) 


293.9,54.5,47.7 
90,90,90 
146.9-3.35 
0.092 (0.327)* 
8.0 (2.0) 

97.5 (97.2) 

3.0 (3.1) 


146.5-3.35 
11187 
24.4/28.5 


0.009 
1.27 


HipB-(O1-O2)/form2 
C2 


100.7,68.9,78.3 
90,93.1,90 
782-350 
0.123 (0.723) 
TSS 

97.0 (97.0) 

3.0 (2.8) 


78.2-3.50 
6749 
24.8/28.9 


0.014 
1.57 
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Extended Data Table 2 | Crystallographic statistics: HipA-HipB promoter complexes 


HipA-HipB-(O1-02) _ HipA-HipB-(O2-O3) 


Data collection 


Space group P4,2;2 P2,2;2; 
Cell dimensions 

a,b,c (A) 228 .2,228.2,130.8 214.0,146.8,53.8 

a, By (°) 90,90,90 90.90.90 
Resolution (A) 161.4-3.77 146.8-3.99 
Rey 0.100 (0.762)* 0.156 (0.608) 
I/ol 6.9 (1.4) 4.0 (1.6) 
Completeness (%) 99.8 (99.4) 94.2 (95.7) 
Redundancy 5.8 (3.9) 2.8 (2.9) 
Refinement 
Resolution (A) 161.4-3.77 146.8-3.99 
No. reflections 35140 14066 
R york Riree(%) 35.9/37.8 30.8/38 .7 
R.m.s deviations 

Bond lengths (A) 0.013 0.012 

Bond angles 1.48 1.67 


*Highest resolution shell is shown in parenthesis. 
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Influence maximization in complex networks 
through optimal percolation 


Flaviano Morone! & Hernan A. Makse! 


The whole frame of interconnections in complex networks hinges 
on a specific set of structural nodes, much smaller than the total 
size, which, if activated, would cause the spread of information to 
the whole network’, or, if immunized, would prevent the diffusion 
of a large scale epidemic*’. Localizing this optimal, that is, min- 
imal, set of structural nodes, called influencers, is one of the most 
important problems in network science*’. Despite the vast use of 
heuristic strategies to identify influential spreaders®*, the prob- 
lem remains unsolved. Here we map the problem onto optimal 
percolation in random networks to identify the minimal set of 
influencers, which arises by minimizing the energy of a many-body 
system, where the form of the interactions is fixed by the non- 
backtracking matrix’ of the network. Big data analyses reveal that 
the set of optimal influencers is much smaller than the one pre- 
dicted by previous heuristic centralities. Remarkably, a large num- 
ber of previously neglected weakly connected nodes emerges 
among the optimal influencers. These are topologically tagged as 
low-degree nodes surrounded by hierarchical coronas of hubs, and 
are uncovered only through the optimal collective interplay of all 
the influencers in the network. The present theoretical framework 
may hold a larger degree of universality, being applicable to other 
hard optimization problems exhibiting a continuous transition 
from a known phase’®. 

The optimal influence problem was initially introduced in the con- 
text of viral marketing’, and its solution was shown to be NP-hard* for 
a generic class of linear threshold models of information spreading’””*. 
Indeed, finding the optimal set of influencers is a many-body problem 
in which the topological interactions between them play a crucial 
role'*"*. On the other hand, there has been an abundant production 
of heuristic rankings to identify influential nodes and ‘superspreaders’ 
in networks*’*’. The main problem is that heuristic methods do not 
optimize a global function of influence. As a consequence, there is no 
guarantee of their performance. 

Here we address the problem of quantifying nodes’ influence by 
finding the optimal (that is, minimal) set of structural influencers. 
After defining a unified mathematical framework for both immuniza- 
tion and spreading, we provide its optimal solution in random net- 
works by mapping the problem onto optimal percolation. In addition, 
we present CI (Collective Influence), a scalable algorithm to solve 
the optimization problem in large-scale real data sets. The thorough 
comparison with competing methods (Supplementary Information 
section I’°) ultimately establishes the better performance of our algo- 
rithm. By taking into account collective influence effects, our optim- 
ization theory identifies a new class of strategic influencers, called 
‘weak nodes’, which outrank the hubs in the network. Thus, the top 
influencers are highly counterintuitive: low-degree nodes play a major 
broker role in the network, and despite being weakly connected, can be 
powerful influencers. 

The problem of finding the minimal set of activated nodes'”* to 
spread information to the whole network’ or to optimally immunize a 
network against epidemics"’ can be exactly mapped onto optimal per- 
colation (see Supplementary Information section IIB). This mapping 


provides the mathematical support to the intuitive relation between 
influence and the concept of cohesion of a network: the most influ- 
ential nodes are the ones forming the minimal set that guarantees a 
global connection of the network®"°. We call this minimal set the 
“optimal influencers’ of the network. At a general level, the optimal 
influence problem can be stated as follows: find the minimal set of 
nodes which, if removed, would break down the network into many 
disconnected pieces. The natural measure of influence is, therefore, the 
size of the largest (giant) connected component as the influencers are 
removed from the network. 

We consider a network composed of N nodes tied with M links with 
an arbitrary-degree distribution. Let us suppose we remove a certain 
fraction q of the total number of nodes. It is well known from percola- 
tion theory”' that, if we choose these nodes randomly, the network 
undergoes a structural collapse at a certain critical fraction where 
the probability of existence of the giant connected component 
vanishes, G = 0. The optimal influence problem corresponds to find- 
ing the minimum fraction q, of influencers to fragment the network: 
qc = min{q € [0, 1]: G(q) = O}. 

Let the vector n= (m,...,%y) represent which node is removed 
(n;=0, influencer) or left (n;=1, the rest) in the network 
(q=1—1/N >; nj), and consider a link from i to j (ij). The order 
parameter of the influence problem is the probability that i belongs 
to the giant component in a modified network where j is absent, v;_,; 
(refs 22, 23). Clearly, in the absence of a giant component we find 
{v;,; = 0} for all ij. The stability of the solution {v,,; = 0} is 
controlled by the largest eigenvalue (n; q) of the linear operator M, 


ae 
defined on the 2M X 2M directed edges as Mi_.0,;; = ee 


OVE-0 | {04-4 =0} 
We find for locally tree-like random graphs (see Fig. la and 
Supplementary Information section II): 


M61) = MB; (1) 


where By..¢;4; is the non-backtracking matrix of the network’. 
The matrix By..¢:.; has non-zero entries only when (k— f, i— j) 
form a pair of consecutive non-backtracking directed edges, that is, 
(k— €, € > )) with k ¥ j. In this case By_,¢¢4; = 1 (equation (13) in 
Supplementary Information). Powers of the matrix B count the num- 
ber of non-backtracking walks of a given length in the network 
(Fig. 1b)**, much in the same way as powers of the adjacency matrix 
count the number of paths*. Operator B has recently received a lot of 
attention thanks to its high performance in the problem of community 
detection**’°. We show its topological power in the problem of 
optimal percolation. 

Stability of the solution {v;_,; = 0} requires A(m; q) = 1. The optimal 
influence problem for a given q (=q.) can be rephrased as finding the 
optimal configuration n that minimizes the largest eigenvalue (n; q) 
(Fig. 1c). The optimal set n* of Nq, influencers is obtained when the 
minimum of the largest eigenvalue reaches the critical threshold: 


An"; gc) =1 (2) 
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Figure 1 | The non-backtracking (NB) matrix and weak nodes. a, The largest 
eigenvalue 2 of M exemplified on a simple network. The optimal strategy 
for immunization and spreading minimizes 2 by removing the minimum 
number of nodes (optimal influencers) that destroys all the loops. Left panel, 
the action of the matrix M is on the directed edges of the network. The entry 
M2335 =13B233-45 =n3 encodes the occupancy (n3 = 1) or vacancy 

(nz = 0) of node 3. In this particular case, the largest eigenvalue is 2 = 1. 
Centre panel, non-optimal removal of a leaf, 14 = 0, which does not decrease 4. 
Right panel, optimal removal of a loop, n3 = 0, which decreases J to zero. 

b, A NB walk is a random walk that is not allowed to return back along the 
edge that it just traversed. We show a NB open walk (€ = 3), a NB closed 
walk with a tail (€ = 4), and a NB closed walk with no tails ({ = 5). The NB 
walks are the building blocks of the diagrammatic expansion to calculate /. 

c, Representation of the global minimum over n of the largest eigenvalue A of 
M versus q. When q = q,, the minimum is at 2 = 0. Then, G = 0 is stable 
(still, non-optimal configurations exist with 2 > 1 for which G > 0). When 
q<4o the minimum of the largest eigenvalue is always 1 > 1, the solution 
G= 0 is unstable, and then G > 0. At the optimal percolation transition, the 
minimum is at n* with A(n", q.) = 1. For q = 0, we find A = k — 1(« = (YK), 
where k is the node degree) which is the largest eigenvalue of B for random 
networks” with all nodes present (n; = 1). When A = 1, the giant component is 
reduced to a tree plus one single loop (unicyclic graph), which is suddenly 
destroyed at the transition q. to become a tree, causing the abrupt fall of 2 

to zero. d, Ball(i, €) of radius € around node i is the set of nodes at distance € 
from i, and @Ball is the set of nodes on the boundary. The shortest path from i 
to j is shown in red. e, Example of a weak node: a node with a small number of 
connections surrounded by hierarchical coronas of hubs at different levels. 


The formal mathematical mapping of the optimal influence problem 
to the minimization of the largest eigenvalue of the modified non- 
backtracking matrix for random networks, equation (2), represents 
our first main result. 

An example of a non-optimized solution corresponds to choosing 
n; at random and decoupled from the non-backtracking matrix”*?’ 
(random percolation”, Supplementary Information section IID). 
In the optimized case, we seek to derandomize the selection of 
the set n; = 0 and optimally choose them to find the best configura- 
tion n* with the lowest q, according to equation (2). The eigen- 
value A(n) (from now on we omit q in A(m,q) =A(n), which is 
always kept fixed) determines the growth rate of an arbitrary 
vector Wo with 2M entries after € iterations of the matrix 
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M:|w7(n)| = (we|we)? =| Mwo| = (wo|(M)1-MA lwo) ~ el si, 
The largest eigenvalue is then calculated by the power method: 


(3) 


Equation (3) is the starting point of an (infinite) perturbation series 
that provides the exact solution to the many-body influence problem in 
random networks and therefore contains all physical effects, including 
the collective influence. In practice, we minimize the cost energy function 
of influence |w,(n)| in equation (3) for a finite €. The solution rapidly 
converges to the exact value as f >, the faster the larger the spectral 
gap. We find for €=1, to leading order in 1/N (Supplementary 
Information section IIE): 


N 
|we(n)|?= S> (ki—1) 


i=l je OBall(, 2¢—1) (I, 


where Ball(i, €) is the set of nodes inside a ball of radius € (defined as the 
shortest path) around node i, OBall(i, £) is the frontier of the ball, P(i, j) 
is the shortest path of length € connecting i and j (Fig. 1d), and k; is the 
degree of node i. 

The first collective optimization in equation (4) is € = 1. We find 
|w1(n)|? = a1 Agki— 1)(kj—1)ninj, where Aj is the adjacency 
matrix (equation (39) in Supplementary Information). This term is 
interpreted as the energy of an antiferromagnetic Ising model with 
random bonds in a random external field at fixed magnetization, 
which is an example of a pair-wise NP-complete spin-glass whose 
solution is found in Supplementary Information section III with the 
cavity method” (Extended Data Fig. 2). 

For €=2, the problem can be mapped exactly to a statistical 
mechanical system with many-body interactions which can be recast 
in terms of a diagrammatic expansion, equations (41)-(49) in Supple- 
mentary Information. For example, |w>(n)|* leads to 4-body interactions 
(equation (45) in Supplementary Information), and, in general, the 
energy cost |w/(n)|” contains 2¢-body interactions. As soon as ¢ = 2, 
the cavity method becomes much more complicated to implement and 
we use another suitable method, called extremal optimization (EO)” 
(Supplementary Information section IV). This method estimates the true 
optimal value of the threshold by finite-size scaling following extrapola- 
tion to £ — ~ (Extended Data Figs 3, 4). However, EO is not scalable to 
find the optimal configuration in large networks. Therefore, we develop 
an adaptive method, which performs excellently in practice, preserves 
the features of EO, and is highly scalable to present-day big data. 

The idea is to remove the nodes causing the biggest drop in the 
energy function, equation (4). First, we define a ball of radius ¢ around 
every node (Fig. 1d). Then, we consider the nodes belonging to the 
frontier OBall(i, €) and assign to node i the collective influence (CI) 
strength at level ¢ following equation (4): 


Ch()=(ki-1) S> (Kj D) (5) 


je eBall(i, 2) 


We notice that, while equation (4) is valid only for odd radii of the ball, 
CI¢(i) is defined also for even radii. This generalization is possible by 
considering an energy function for even radii analogous to equation 
(4), as explained in Supplementary Information section IIG. The case 
of one-body interaction with zero radius f= 0 (equation (59) in 
Supplementary Information) leads to the high-degree (HD) ranking 
(equation (62) in Supplementary Information)’”. 

The collective influence, equation (5), is our second and most 
important result since it is the basis for the highly scalable and opti- 
mized C] algorithm which follows. In the beginning, all the nodes are 
present: n; = 1 for all i. Then, we remove node i* with highest Cl¢ and 
set nj = 0. The degree of each neighbour of i* is decreased by one, and 
the procedure is repeated to find the new top CI node to remove. The 
algorithm is terminated when the giant component is zero (see 
Supplementary Information section V for implementation, and 


n) (kj—1) (4) 
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Figure 2 | Exact optimal solution and performance of CI in synthetic 
networks. a, G(q) in an ER network (N = 2 X 10°, (k) = 3.5, error bars are 
s.e.m. over 20 realizations). We show the true optimal solution found with EO 
(‘X? symbol), and also using CI, HDA, PR, HD, CC, EC and k-core methods. 
The other methods are not scalable and perform worse than HDA and are 
treated in Supplementary Information sections VI and VII (Extended Data 
Figs 8, 9). Cl is close to the optimal q°?' = 0.192(9) obtained with EO in 
Supplementary Information section IV. Note that EO can estimate the 
extrapolated optimal value of q., but it cannot provide the optimal 


Supplementary Information section VA for minimizing G(q) ~ 0). By 
increasing the radius ¢ of the ball we obtain better and better approx- 
imations of the optimal exact solution as f > - (for finite networks, ¢ 
does not exceed the network diameter). 

The collective influence Cl¢ for € = 1 has a rich topological content, 
and consequently tells us more about the role played by nodes in the 
network than the non-interacting high-degree hub-removal strategy at 
£=0, Cly. The augmented information comes from the sum in the 
right hand side of equation (5), which is absent in the naive high- 
degree rank. This sum contains the contribution of the nodes living 
on the surface of the ball surrounding the central vertex i, each node 
weighted by the factor k; — 1. This means that a node placed at the 
centre of a corona irradiating many links—the structure hierarchically 
emerging at different ¢ levels as seen in Fig. le—can have a very large 
collective influence, even ifit has a moderate or low degree. Such ‘weak 
nodes’ can outrank nodes with larger degree that occupy mediocre 
peripheral locations in the network. The commonly used word ‘weak’ 
in this context sounds particularly paradoxical. It is, indeed, usually 
used as a synonym for a low-degree node with an additional bridging 
property, which has resisted a quantitative formulation. We provide 
this definition through equation (5), according to which weak nodes 
are, de facto, quite strong. Paraphrasing Granovetter’s conundrum”, 
equation (5) quantifies the “strength of weak nodes”. 

The C]-algorithm scales as ~ O(N log N) by removing a finite frac- 
tion of nodes at each step (Supplementary Information section VB). 
This high scalability allows us to find top influencers in current big-data 
social media and the minimal set of people to immunize in large-scale 
populations at the country level. The applications are investigated next. 

Figure 2a shows the optimal threshold q, for a random Erdés-Reényi 
(ER) network° (marked by the vertical line) obtained by extrapolating 
the EO solution to N—>oo and £0 (Supplementary Information 
section IV). In the same figure we compare the optimal threshold against 
the heuristic centrality measures: high-degree (HD)’, high-degree 
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configuration for large systems. Inset, gq, (obtained at the peak of the second- 
largest cluster) for the three best methods versus (k). b, G(q) for a SF network 
with degree exponent y = 3, maximum degree k,,,x = 10°, minimum degree 
kmin = 2 and N = 2 X 10° (error bars are s.e.m. over 20 realizations). Inset, Ic 
versus y. The continuous blue line is the HD analytical result computed in 
Supplementary Information section IIG (Extended Data Fig. 1b). c, Example of 
SF network with y = 3 after the removal of 15% of nodes, using the three 
methods HD, HDA and CI. CI produces a much reduced giant component G 
(red nodes). 


adaptive (HDA), PageRank (PR)’, closeness centrality (CC)°, eigenvec- 
tor centrality (EC)°, and k-core’” (see Supplementary Information sec- 
tion I for definitions). Supplementary Information sections VI and VII 
show the comparison with the remaining heuristics®’’ and the Belief 
Propagation method of ref. 14, respectively, which have worse compu- 
tational complexity (and optimality), and cannot be applied to the net- 
work sizes used here. Remarkably, at the optimal value q. predicted by 
our theory, the best among the heuristic methods (HDA, PR and HD) 
still predict a giant component ~50-60% of the whole original network. 
Furthermore, the influencer threshold predicted by CI approximates 
very well the optimal one, and, notably, CI outperforms the other strat- 
egies. Figure 2b compares Cl in scale-free (SF) networks’ against the best 
heuristic methods, that is, HDA and HD. In all cases, CI produces a 
smaller threshold and a smaller giant component (Fig. 2c). 

As an example of an information spreading network, we consider 
the web of Twitter users (Supplementary Information section VIII’). 
Figure 3a shows the giant component of Twitter when a fraction q of its 
influencers is removed following CI. It is surprising that a lot of Twitter 
users with a large number of contacts have a mild influence on the 
network. This is witnessed by the fact that, when CI (at € = 5) predicts a 
zero giant component (and so it exhausts the number of optimal influ- 
encers), the scalable heuristic ranks (HD, HDA, PR and k-core) still 
give a substantial giant component of the order of 30-70% of the entire 
network. These heuristics also, inevitably, find a remarkably large num- 
ber of (fake) influencers, which is at least 50% larger than that predicted 
by CI (Fig. 3b and Supplementary Information section VIII). One cause 
for the poor performance of the high-degree-based ranks is that most of 
the hubs are clustered, which gives a mediocre importance to their 
contacts. As a consequence, hubs are outranked by nodes with lower 
degree surrounded by coronas of hubs (shown in detail in Fig. 3c), that 
is, the weak nodes predicted by the theory (Fig. le). 

Finally, we simulate an immunization scheme on a personal contact 
network built from the phone calls performed by 14 million people in 
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Figure 3 | Performance of CI in large-scale real social networks. a, Giant 
component G(q) of Twitter users’? (N = 469,013) computed using CI, HDA, 
PR, HD and k-core strategies (other heuristics have prohibitive running times 
for this system size). b, Percentage of fake influencers or false positives 

(PFI, equation (120) in Supplementary Information) in Twitter as a function of 
q, defined as the percentage of non-optimal influencers identified by the HD 
algorithm in comparison with CI. Below qh, PFI reaches as much as ~40%, 
indicating the failure of HD in optimally finding the top influencers. Indeed, to 
obtain G = 0, HD has to remove a much larger number of fake influencers, 
which at qi!” reaches PFI ~ 48%. c, An example of the many weak nodes found 
in Twitter. These crucial influencers were missed by all heuristic strategies. 

d, G(q) for a social network of 1.4 x 10’ mobile phone users in Mexico 
representing an example of big data to test the scalability and performance of 
the algorithm in real networks. CI immunizes this social network using half 
a million fewer people than the best heuristic strategy (HDA), saving ~35% 
of the vaccine stockpile. 


Mexico (Supplementary Information section IX). Figure 3d shows that 
our method saves a large number of vaccines or, equivalently, finds the 
smallest possible set of people to quarantine; our method therefore also 
outranks the scalable heuristics in large real networks. Thus, while the 
mapping of the influencer identification problem onto optimal per- 
colation is strictly valid for locally tree-like random networks, our 
results may apply also to real loopy networks, provided the density 
of loops is not excessively large. 

Our solution to the optimal influence problem shows its importance 
in that it helps to unveil hitherto hidden relations between people, as 
witnessed by the weak-node effect. This, in turn, is the by-product of a 
broader notion of influence, lifted from the individual non-interacting 
point of view*’*'°”° to the collective sphere: influence is an emergent 
property of collectivity, and top influencers arise from the optimiza- 
tion of the complex interactions they stipulate. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | High-degree (HD) threshold. a, HD influence 
threshold q, as a function of the degree distribution exponent y of scale-free 
networks in the ensemble with kmax = mN"/’~? and N-> ©. The curves refer 
to different values of the minimum degree m: 1 (red), 2 (blue), 3 (black). 

The fragility of SF networks (small q,) is notable for m = 1 (the case calculated 
in ref. 10). In this case (m = 1), the network contains many leaves, and reduces 
to a star at y = 2, which is trivially destroyed by removing the only single 
hub, explaining the general fragility in this case. Furthermore, in this same case, 
the network becomes a collection of dimers with k = 1 when y—> ©, which is 
still trivially fragile. This also explains why q.— 0 for y = 4. Therefore, the 
fragility in the case m = 1 has its roots in these two limiting trivial cases. 
Removing the leaves (m = 2) results in a 2-core, which is already more robust. 


For the 3-core m = 3, q. ~ 0.4-0.5 provides a quite robust network, and has the 
expected asymptotic limit to a non-zero q_ of a random regular graph with 
k=3asy>~, q.— (k — 2)/(k — 1) = 0.5. Thus, SF networks become 
robust in these more realistic cases, and the search for other attack strategies 
becomes even more important. b, HD influence threshold q. as a function of the 
degree distribution exponent of scale-free networks with minimum degree 

m = 2 in the ensemble where k,,,x is fixed and does not scale with N. The 
curves refer to different values of the cut-off kmax: 10° (red), 10° (green), 10° 
(blue), 10° (magenta), and kyax = 0 (black), and show that for a typical kyax 
degree of 10°, for instance in social networks, the network is fairly robust with 
qc ~ 0.2 for all y. The curve with m = 2 and kinax = 10° is replotted in the inset 
of Fig. 2b. 
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Extended Data Figure 2 | Replica Symmetry (RS) estimation of the and then averaged over 40 realizations of the network (error bars are s.e.m.). 
maximum eigenvalue. Main panel, the eigenvalue 2S), equation (92) in Inset, comparison between the RS cavity method and EO (extremal 
Supplementary Information for the two-body interaction € = 1, obtained by _ optimization) for an ER graph of (k) = 3.5 and N= 128. The curves are 
minimizing the energy function €(s) with the RS cavity method. The curve was _ averaged over 200 realizations (error bars are s.e.m.). 

computed on an ER graph of N = 10,000 nodes and average degree (k) = 3.5 
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Extended Data Figure 3 | EO estimation of the maximum eigenvalue. each panel refer to different sizes of ER networks with average connectivity 
Eigenvalue 1(q) obtained by minimizing the energy function €(n) with TEO (k) = 3.5. Each curve is an average over 200 instances (error bars are s.e.m.). 
(t-extremal optimization), plotted as a function of the fraction of removed The value q. where A(q,) = 1 is the threshold for a particular N and many-body 


nodes q. The panels are for different orders of the interactions. The curves in _ interaction. 
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Extended Data Figure 4 | Estimation of optimal threshold q°?' with EO. 
a, Critical threshold q, as a function of the system size N, obtained with EO from 
Extended Data Fig. 3, of ER networks with (k) = 3.5 and varying size. The 
curves refer to different orders of the many-body interactions. The data show a 
linear behaviour as a function of N-~”, typical of spin glasses, for each many- 


fit ----- 4 
1-body (HD) 
qc? =0.310 

~~ 
2-body 
qt =0.251 
0.6 0.8 1 


1/p 


body interaction p. The extrapolated value q* (p) is obtained at the y intercept. 
b, Thermodynamic critical threshold q2°(p) as a function of the order of the 
interactions p from a. The data scale linearly with 1/p. From the y intercept 
of the linear fit we obtain the thermodynamic limit of the infinite-body 
optimal value q°?' = 4° (p > 00) =0.192(9). 
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Extended Data Figure 5 | Comparison of the CI algorithm for different modified NB matrix controls the stability of the solution G = 0, and not the 
radii ¢ of the Ball(€). We use ¢ = 1, 2,3, 4,5, ona ER graph with average degree _ stability of the solution G > 0. In the region where G > 0 we use a simple 

(k) = 3.5 and N = 10° (the average is taken over 20 realizations of the network, and fast procedure to minimize G explained in Supplementary Information 
error bars are s.e.m.). For € = 3 the performance is already practically section VA. This explains why there is a small dependence on having a slightly 
indistinguishable from f = 4, 5. The stability analysis we developed to larger G for larger £, when G> 0 in the region q ~ 0.15. 

minimize q, is strictly valid only when G = 0, since the largest eigenvalue of the 
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Extended Data Figure 6 | Illustration of the algorithm used to minimize in the network. For example, the red node has c(red) = 2, while the blue one has 
G(q) for q< q.. Starting from the completely fragmented network at q=q., _c(blue) = 3. The node with the smallest c(i) is reinserted in the network: in this 
the Nq, influencers are reinserted with their original degree and connected case the red node. Then the c(i)s are recalculated and the new node with the 
to their original neighbours with the following criterion: each node is assigned _ smallest c(i) is found and reinserted. These steps are repeated until all the 
and index c(i) given by the number of clusters it would join if it were reinserted removed nodes are reinserted in the network. 
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Extended Data Figure 7 | Test of the decimation fraction. Giant component G as a function of the fraction of removed nodes q using CI, for an ER network of 
N= 10° nodes and average degree (k) = 3.5. The profiles of the curves are drawn for different percentages of nodes fixed at each step of the decimation algorithm. 
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Extended Data Figure 8 | Comparison of the performance of CI, BC and EGP in destroying G. We also include HD, HDA, EC, CC, k-core and PR. We use a 
scale-free (SF) network with degree exponent y = 2.5, average degree (k) = 4.68, and N = 10°. We use the same parameters as in ref. 11. 
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Extended Data Figure 9 | Comparison with BP for a network 
immunization. a, Fraction of infected nodes f as a function of the fraction of 
immunized nodes q in the susceptible-infected-removed (SIR) model from the 
BP solution. We use an ER random graph of N = 200 nodes and average degree 
{k) = 3.5. The fraction of initially infected nodes is p = 0.1 and the inverse 
temperature / = 3.0. The profiles are drawn for different values of the 
transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 0.7 
(magenta). Also shown are the results of the fixed density BP algorithm 
(open circles). b, Chemical potential yz as a function of the immunized nodes 
q from BP. We use an ER random graph of N = 200 nodes and average degree 
(k) = 3.5. The fraction of the initially infected nodes is p = 0.1 and the 
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inverse temperature / = 3.0. The profiles are drawn for different values of 
the transmission probability w: 0.4 (red curve), 0.5 (green), 0.6 (blue), 0.7 
(magenta). Also shown are the results of the fixed density BP algorithm 
(open circles) for the region where the chemical potential is non-convex. 

c, Comparison between the giant components obtained with CI], HDA, HD and 
BP. We use an ER network of N = 10° and (k) = 3.5. We also show the solution 
of CI from Fig. 2a for N= 10°. We find in order of performance: CI, HDA, 
BP and HD. (The average is taken over 20 realizations of the network, error bars 
are s.e.m.) d, Comparison between the giant components obtained with CI, 
HDA, HD and BPD. We use a SF network with degree exponent y = 3.0, 
minimum degree kin = 2, and N= 10* nodes. 
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Extended Data Figure 10 | Fraction of infected nodes f{q) as a function of | We compare CI, HDA and BP. All strategies give similar performance, owing 
the fraction of immunized nodes q in SIR from BP. We use the following _to the large value of the initial infection p, which washes out the optimization 
parameters: initial fraction of infected people p = 0.1, and transmission performed by any sensible strategy, in agreement with the results shown in 
probability w = 0.5. We use an ER network of N = 10° nodes and (k) = 3.5. figure 12a of ref. 14. 
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Beating the Stoner criterion using molecular 


interfaces 
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Gavin Burnell’, Bryan J. Hickey! & Oscar Cespedes! 


Only three elements are ferromagnetic at room temperature: the 
transition metals iron, cobalt and nickel. The Stoner criterion 
explains why iron is ferromagnetic but manganese, for example, 
is not, even though both elements have an unfilled 3d shell and are 
adjacent in the periodic table: according to this criterion, the prod- 
uct of the density of states and the exchange integral must be 
greater than unity for spontaneous spin ordering to emerge’. 
Here we demonstrate that it is possible to alter the electronic states 
of non-ferromagnetic materials, such as diamagnetic copper and 
paramagnetic manganese, to overcome the Stoner criterion and 
make them ferromagnetic at room temperature. This effect is 
achieved via interfaces between metallic thin films and Cg9 molecu- 
lar layers. The emergent ferromagnetic state exists over several 
layers of the metal before being quenched at large sample thick- 
nesses by the material’s bulk properties. Although the induced 
magnetization is easily measurable by magnetometry, low-energy 
muon spin spectroscopy’ provides insight into its distribution by 
studying the depolarization process of low-energy muons 
implanted in the sample. This technique indicates localized spin- 
ordered states at, and close to, the metal-molecule interface. 
Density functional theory simulations suggest a mechanism based 
on magnetic hardening of the metal atoms, owing to electron 
transfer**. This mechanism might allow for the exploitation of 


molecular coupling to design magnetic metamaterials using 
abundant, non-toxic components such as organic semiconductors. 
Charge transfer at molecular interfaces may thus be used to control 
spin polarization or magnetization, with consequences for the 
design of devices for electronic, power or computing applications 
(see, for example, refs 6 and 7). 

Multifunctional materials with the spin degree of freedom, such as 
multiferroics, magnetic semiconductors and molecular magnets, have 
aroused interest as potentially transformative components in quantum 
technologies*"’. Strategies used to bring magnetic ordering to these 
materials typically rely on the inclusion of magnetic transition metals, 
heavy elements with a large atomic moment or rare earths. In thin-film 
structures, proximity effects and coupling at interfaces have an essential 
role in determining magnetic and transport characteristics of the struc- 
tures'*™*, This is especially the case for molecular spintronics'*’*, where 
organic thin films grown on Cu have demonstrated spin filtering'’. The 
organic-magnetic coupling can propagate for long distances in systems 
such as nanoscale vortex-like configurations or nanoskyrmion lattices’*. 

We choose C¢g as a model molecule, owing to its structural simpli- 
city and robustness as well as its high electron affinity. C¢/transition- 
metal complexes exhibit strong interfacial coupling between metal 3d, 
electrons and molecular n-bonded p electrons. The potential created 
by the mismatch of molecular and metal work functions leads to a 


C,/Cu 
interface 
region 


20 4 


Figure 1 | Effect of molecular interfaces. 
Schematics and room-temperature magnetization 
for a Ta(5)/[Ceo(15)/Cu(2)] X5/Al(5) and a 
Ta(5)/[Cgo(15)/Al(3)/Cu(2)/Al(3)] <5 sample; the 
numbers in parentheses are the film thicknesses in 
4 nanometres. The Cu-to-Cgo charge transfer and 
interface reconstruction results in substantial 
changes in the density of states (DOS) of the 

_| — metallic film and a band splitting that leads to 
magnetic ordering. On the other hand, an Al spacer 
between both materials screens the charge transfer 
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partial filling of the interface states'”*’. Other molecules with close 
electron affinity and the potential for 3d,-p coupling could be used to 
similar effect. In the case of Cgg on metallic substrates such as Cu films, 
the charge transfer from the metal can be of up to three electrons per 
molecule and leads to a metallization of the interface’. Magnetic 
polarization in fullerenes induced by spin injection or charge transfer 
may extend for long distances, owing to low spin-orbit coupling and 
the absence of a hyperfine interaction”. 

In the metal, it is expected that the charge transfer will be quickly 
screened by free electrons. A priori, there is no reason to suspect that a 
spin-unpolarized molecule would change the magnetic state of a 
metallic film. However, we find that the charge transfer and surface 
reconstruction at the interface” can lead to an emergent magnetization 
in both the metal and the molecule. Magnetometry measurements of 
Ceo/Cu and C¢o/Mn multilayers show hysteresis at room temperature. 
The magnetization disappears when all the transition-metal-molecular 
interfaces are decoupled via an Al or Al,O; spacer layer (Fig. 1). 

Changes in the density of states (DOS) of the metal may be larger 
close to the interface, but should be screened deeper within the mater- 
ial. If the film is thick enough, the bulk properties of the metal are 
expected to dominate and quench the magnetization. This effect is 
shown in Fig. 2: the magnetization of Cg9/Cu and Cgo/Mn multilayers 
decays once the metallic-film thickness exceeds 2-3 nm. Decreasing 
the coupling between the top and bottom interfaces of a metal layer 
may also play a role in quenching the magnetization. 

The magnetization of Cgo/Cu samples is 3-4 times stronger than 
that of Cgo/Mn, which is probably due to the better lattice matching 
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Figure 2 | Room-temperature magnetization for Cu and Mn films. Dashed 
lines are exponentially modified Gaussian fits. Error bars in thickness constitute 
the film roughness and in magnetization are calculated as the standard error 
of the mean. a, Dependence of the magnetization on the Cu-film thickness 
for a total of 145 samples with the structure Ta(5)/[Ceo(10-20)/Cu(¢)/ 
C6o(10-20)] X (1-5)/Al(5). Films with t < 1-1.5 nm are discontinuous. Inset, 
magnetic moment versus the number of Cgo(15)/Cu(2.5) interfaces; they are 
roughly proportional. b. As for a, but for Mn, with 96 samples measured. The 
magnetization in Mn films is smaller than in Cu films, but propagates for a 
longer distance. Inset, out-of-plane and in-plane magnetization measurements 
of a [Cgo(15)/Mn(2.5)] X4 sample as a function of magnetic field strength H. 
Ha, magnetic field strength at which the magnetization saturates; emu, 
electromagnetic unit; 4g, Bohr magneton. 
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and larger charge transfer between Cu and Ceo (ref. 19). However, 
bulk Mn is paramagnetic and much closer to complying with the 
Stoner criterion than is diamagnetic Cu, owing to the larger exchange 
interactions and DOS at the Fermi level (DOS(E,))**. This property 
may be correlated with the propagation length of the effect, which is 
five times greater in Mn than in Cu. Cgo/Mn multilayers also show a 
larger paramagnetic slope of the magnetization than does decoupled 
Mn; see Supplementary Figs 1-4. Both systems exhibit anisotropy with 
an easy axis that lies in the plane of the film, and out-of-plane satura- 
tion fields of about 10-15 kOe at room temperature (inset of Fig. 2b). 
The Cu and Mn samples degrade with time, and the magnetization 
drops over several days or weeks depending on the layer structure and 
protective cap used (Supplementary Fig. 5). 

To explore the dependence of magnetization on interfacial coupling, 
we fabricated samples with different numbers of Cgo(15)/Cu(2.5) junc- 
tions (the numbers in parentheses are the film thicknesses in nano- 
metres). The magnetic moment of these multilayers is proportional to 
the number of Cg,/Cu interfaces, suggesting that the magnetism is due 
to molecular coupling (inset of Fig. 2a). However, the amount of Cu 
and C¢p also increases as we grow more layers. To ensure that the 
magnetization is not simply proportional to the amount of material 
deposited, we performed a related set of measurements where the total 
sample thickness is kept constant: 9nm of Cu and 81 nm of Cgo, but 
split into different numbers of Cgo/Cu repeats (Supplementary Fig. 6). 
In this case, the magnetism also increases with the number of inter- 
faces; for example, the magnetic moment of [C¢9(16.2)/Cu(1.8)] X5 is 
greater than the magnetic moment of [C¢9(27)/Cu(3)] x3. However, 
trying to split the sample into Cu films that are < 1.5 nm thick results 
in discontinuous layers and a drop in the magnetization. This thick- 
ness and interface dependence of the magnetization could not arise 
from contaminants, and X-ray spectroscopy did not show the presence 
of impurities (Supplementary Figs 7-11). 

Magnetometry measurements show that the magnetization is 
dependent on the thickness of the metal, but not on the thickness of 
the molecular film, as long as the latter is continuous and smooth 
(about 10-20nm thick). However, magnetometry by itself cannot 
determine where the magnetization is located or how much of it corre- 
sponds to each material. Conversely, low-energy muon spin rotation 
(uSR) provides a magnetic profile of the sample” and has been applied 
successfully to other metallo-molecular systems’. Here, a beam of 
almost fully polarized positive muons is moderated to keV energies 
so that their tunable stopping range is tens to hundreds of nanometres. 
The local polarization at the positive-muon stopping depth is probed 
through the detection of decay positrons, preferentially emitted along 
the muons’ spin direction. 

We use this technique to study two samples: multilayer A is a 
magnetic sample with the structure (from bottom to top) Ta(5)/ 
Coo(20)/Cu(2.5)/Ceo(50)/Au(10); multilayer B is a decoupled, diamag- 
netic reference sample with Al,O3; layers in between the Cu and Ceo 
(Fig. 3a). The uppermost gold film slows injected positive muons and 
protects the inner layers from oxidation. The total sample structure is 
designed to allow the active layers to be probed with a range of access- 
ible positive-muon energies and to maximize the stopping profile at the 
regions of interest, that is, close to the Cgo/Cu interface. The Cu thick- 
ness is chosen to obtain the highest magnetization (Fig. 2a). Muon 
stopping profiles and further experimental details can be found in 
Supplementary Information section S.3 and Supplementary Figs 12-13. 

Muons with 4 keV implantation energy probe the identical, upper- 
most Cgo(50)/Au(10) layers of both samples. Nevertheless, the zero- 
field wSR measurements at 250 K demonstrate a significant difference 
(P < 0.001) in the polarization of the implanted muons: for the mag- 
netic multilayer-A sample, a fraction of approximately 10% of the 
muon-spin polarization is rapidly lost, which indicates that about 
10% of the region sampled by the muons is affected by the magnetism 
(Fig. 3b). This result points to additional sources of magnetic flux in 
the multilayer-A sample. In this (uppermost C¢o(50)/Au(10)) region, 
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Figure 3 | Muon spin rotation (SR) spectroscopy at 250K. Error bars are 
the standard error of the mean in about 10° events. a, Schematic of the 
experiment and the samples measured: multilayer A, [Ta(5)/Cgo(20)/Cu(2.5)/ 
C¢0(50)/Au(10)]; and the control, multilayer B, [Ta(5)/Cgo(20)/Al,03(4)/ 
Cu(2.5)/Al,03(4)/Ceo(50)/Au(10)]. Far-right, raw hysteresis loops; the 
horizontal scale bar represents 1 kOe and the vertical scale bar represents 

100 emu per cm? of Cu in multilayer A or B, as appropriate. b, Zero-field uSR 
spectra for 4keV, 6 keV, 8keV and 16 keV muon implantation energies. 
Multilayer B is plotted in red; multilayer A is plotted in blue. Tables show the 
fraction of positive muons (1*) stopped in each layer in each sample. At 8 keV, 
zero-field spectra are shown before (closed symbols) and after (open symbols, 
remanent state) a magnetic field with a strength of 300 Oe was applied, 
evidencing clear differences in the multilayer-A sample. c, Oscillation 


the additional flux most probably arises from stray dipolar fields, since 
at low temperatures (20 K) we find that 75% of the positive muons 
implanted in the Cgo layer form a bound electron—-muon state called 
muonium, which is observed in a non-magnetic environment”. This 
observation strongly indicates that the Cg, layer is, for the most part, 
free of magnetic moments, and suggests that the magnetism is loca- 
lized at the Cu/Cgo interface. Owing to the presence of muonium in 
Cgo, the data analysis is difficult, but further support for this scenario 
comes from the energy/depth dependence of the SR data at low and 
high temperatures. At 20K, the observable muonium fraction 
decreases in the magnetic multilayer-A sample as the Cu layer is 
approached (Supplementary Fig. 14). Analogously, at 250K, the dif- 
ference between the spectra of the two samples increases for the 6-keV 
data and is even greater at 8 keV, the energy at which the muons most 
heavily sample the Cu layer (Fig. 3b). If the Cu were non-magnetic, 
then one would expect an overall increase of the muon polarization, 
which is not observed. 

Another means of locating the magnetism in the multilayer-A sam- 
ple is to study its wSR response in the zero-field remanent state. Both 
samples contain an oscillation at 0.4 MHz, owing to the muonium 
formed in semiconducting Cgo. After applying an external field of 
300 Oe, this signal is shifted to 0.6 MHz, which we attribute to a small 


frequencies from fits to the data plotted in b. At remanence, a new signal f, at 
approximately 0.75 MHz is observed in the multilayer-A sample, in addition to 
the signal f, at 0.6 MHz that is observed for both samples, with f, attributed to 
the emergent magnetization. d, The polarization (signal) amplitude of the 
magnetic remanent signal at 0.75 MHz tracks the fraction of muons stopped in 
the Cu (maximum at 8 keV), whereas the signal associated with muonium at 
0.6 MHz is anti-correlated to it. The markers correspond to the signal 
amplitude (measured by the right-hand axis) for the zero-field spectra at the 
given energy E. The background shading corresponds to the fraction of positive 
muons (j1") stopped in the given layer in multilayer A, beginning with 
backscattered muons, which do not decay in the film, and ending with those 
muons that enter the Ta seed layer; the shading for the Cu, Ceo and Au layers 
corresponds to the data shown in blue in the inset tables in b. 


residual field of approximately 0.3 Oe in the apparatus. Nonetheless, 
the remanent 8-keV SR spectra shown in Fig. 3b are clearly different 
from the virgin spectra for the magnetic multilayer-A sample, whereas 
only subtle changes are observed for the non-magnetic reference sam- 
ple multilayer B. The new feature related to the remanent state of 
multilayer A is an additional oscillation at approximately 0.75 MHz, 
which is not observed for the non-magnetic multilayer B (Fig. 3c). This 
new frequency is explained by an additional magnetic field of 0.1 Oe at 
the muonium site in close proximity to the Cu layer. The amplitude 
of this signal follows the fraction of positive muons stopped in the 
Cu layer (Fig. 3d), whereas the non-magnetic signal at 0.6 MHz is anti- 
correlated to this fraction. Altogether, the low-energy SR data fully 
support the notion of a magnetic moment being localized in the metal- 
lic layer and the immediate Cu/Cgo interface. 

To search for the origins of the induced magnetization, we modelled 
the Cu/Cgo interface using density functional theory (DFT); see 
Supplementary Information section S.4. The molecular roughness of 
the Co films has been accounted for via several interface models based 
on: (1) the single crystal 7-vacancy Cu(111) reconstruction”; (2) Céo 
encapsulation into adsorbed Cu(111) films (Cu{Cgo}); and (3) Cu(111) 
growth into the pits of the Cg film (Cgo{Cu}); see Fig. 4a. Regardless 
of the adopted model variant (Supplementary Figs 15-19), we 
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Figure 4 | DFT simulations and metamagnetic modellling. a, Schematic 
of the molecularly rough Cu/Cgo interface with: atomically flat Cgo/Cu contacts 
(green square); C¢o inclusion in Cu film (Cu{Cgo}; red square); and Cu inclusion 
in pits of the C¢o film (Cgo{Cu}; black square). The optimized atomic 
structures are reported and labelled in Supplementary Figs 15-19. b, Total DOS 
per electronic spin state as a function of energy around the Fermi level 
(DOS(E — Ef)) for bulk Cu and the Cu/C,o interface models. In the Cu{Cgo} 
models, ‘-1.5’, ‘-2.0’, ‘-2.5° and ‘-3.0’ indicate the initial Cu-C distance 

(in angstroms); the Cu{Cgo}-1.5 model with 55 Cu vacancies is denoted 
Cu{Cgo}-1.5 (55v). In the atomically flat Cs9/Cu contact models, *-3L’, “-5L’ and 


find a non-magnetic ground state for all the considered interfaces. 
With the exception of the thicker Cgo/slab (‘slab’ refers to a finite, 
non-monoatomic layer) interfaces and the Cu{Cgo} models that are 
prepared with short (1.5-2 A) initial Cu-C distances, all the models 
exhibit positive curvature in DOS(Ep); see Fig. 4b. Within the mean- 
field itinerant-electron model (Supplementary Fig. 20), convex 
DOS(Eg) may lead, for sufficiently high external magnetic fields, to a 
spontaneous first-order paramagnetic-to-ferromagnetic metamag- 
netic transition. For the computed DOS(Eg) of the Cu/Cgo interface 
models, the critical magnetic field strength (H.) for the metamagnetic 
transition sharply decreases with increasing values of the Stoner 
exchange integral (I;) according to [1 — I,DOS(E,)]*”7; see Fig. 4d. 
Atom-resolved analysis of Is; reveals a change in the exchange 
strength at the Cu/Cgo interfaces of up to a factor of four (Fig. 4c). 
Magnetic hardening by up to a factor of three has been previously 
reported for magnetic cobalt atoms contacted to m-conjugated mole- 
cules’. The computed values of Is; (from 0.86 eV for bulk Cu to up to 
more than 2.5 eV for interfacial Cu atoms) suggest viable paramag- 
netic-to-ferromagnetic metamagnetic transitions for field strengths 
lower than 1 kOe for thin Cu layers (Fig. 4d). On this basis, we attribute 
the measured ferromagnetism to a transition of the Cu/Cgo system in 
magnetic fields with strengths of 0.3-5 kOe generated during sample 
deposition and preparation (see Methods). Our DFT calculations pre- 
dict that 77-95% of the magnetization in the Cu/Cgo system will be 
distributed in the metal (Supplementary Table 8 and Supplementary 
Figs 21-26), in good agreement with the muon spectroscopy data. 
Although the substantial electron transfer from the Cu layers to Cg 
(21.6 electrons per Ceo molecule, depending on the model; Supple- 
mentary Table 7) is effective in altering the curvature of DOS(Ep) 
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*-10L’ indicate the number of Cu layers in the model. In the Cgo{Cu} models, 
‘-capped’ refers to a structure that is topped with a continuous Cu layer. 

c, Atom-resolved analysis of the Stoner exchange integral Is for the Cu atoms as 
a function of the shortest Cu-Cgp distance. I; for bulk Cu is 0.86 eV. d, The 
critical magnetic field strength (H_) for spontaneous ferromagnet 
metamagnetic transition as a function of Is for the models computed to have 
positive curvature at DOS(E,) in b. The horizontal black line marks a 

typical magnetic field strength during sample preparation. Shaded area 
represents the possible values for I; in the different geometries, with the bulk 
value of 0.86 eV indicated by the vertical black line. 


and increasing Is, the calculated DOS(E;) < Is; product remains less 
than unity (Supplementary Table 6), which does not fulfil the Stoner 
criterion. However, despite not satisfying the Stoner criterion in the 
ground state, magnetometry and muon spectroscopy presented here 
provide conclusive evidence for the emergence of magnetism at Cu/ 
Co interfaces. This is probably associated with a sharp decrease of five 
orders of magnitude in the ferro-metamagnetic critical field strength 
H_ as Ig increases that is made possible by Cgg-induced magnetic hard- 
ening of Cu. Similar effects due to charge transfer could also take place 
in other hybrid metallo-organic!” and d° magnetic systems*’. To max- 
imize this effect, it should be possible to look for molecules with large 
electron affinity such as polyoxometalates and metals with a large 
exchange integral such as zinc. However, good band and structural 
matching is needed to obtain noticeable results. Manipulating the 
charge transfer by applying electric potentials or using energy band 
matching may lead to applications in molecular memories or devices 
such as spin capacitors. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 11 December 2014; accepted 22 May 2015. 


1. Stoner, E. C. Collective electron ferromagnetism. Proc. R. Soc. London Ser. A 165, 
372-414 (1938). 

2. Stoner, E. C. Collective electron ferromagnetism. Il. Energy and specific heat. Proc. 
R. Soc. London Ser. A 169, 339-371 (1939). 

3. Drew,A.J.eta/. Direct measurement of the electronic spin diffusion length ina fully 
functional organic spin valve by low-energy muon spin rotation. Nature Mater. 8, 
109-114 (2009). 

4. Vandewal, K. et al. Efficient charge generation by relaxed charge-transfer states at 
organic interfaces. Nature Mater. 13, 63-68 (2014). 


©2015 Macmillan Publishers Limited. All rights reserved 


Callsen, M., Caciuc, V., Kiselev, N., Atodiresei, N. & Bluegel, S. Magnetic hardening 
induced by nonmagnetic organic molecules. Phys. Rev. Lett. 111, 106805 (2013). 
Moodera, J. S., Koopmans, B. & Oppeneer, P. M. On the path toward organic 
spintronics. MRS Bull. 39, 578-581 (2014). 

Raman, K. V. Interface-assisted molecular spintronics. Appi. Phys. Rev. 1,031101 
(2014). 
Beeler, M. C. et al. The spin Hall effect in a quantum gas. Nature 498, 201-204 
(2013). 
Eerenstein, W., Mathur, N. D. & Scott, J. F. Multiferroic and magnetoelectric 
materials. Nature 442, 759-765 (2006). 


. Powell, A. K. Molecular magnetism: a bridge to higher ground. Nature Chem. 2, 


351-352 (2010). 


. Geng, Y. et al. Direct visualization of magnetoelectric domains. Nature Mater. 13, 


163-167 (2014). 


. Warner, M. et a/. Potential for spin-based information processing in a thin-film 


molecular semiconductor. Nature 503, 504-508 (2013). 
accherozzi, F. et al. Evidence for a magnetic proximity effect up to room 
temperature at Fe/(Ga,Mn) As interfaces. Phys. Rev. Lett 101, 267201 (2008). 


. Vobornik, |. et a/. Magnetic proximity effect as a pathway to spintronic applications 


of topological insulators. Nano Lett 11, 4079-4082 (2011). 


. Barraud, C. et al. Unravelling the role of the interface for spin injection into organic 


semiconductors. Nature Phys. 6, 615-620 (2010). 


. Sanvito, S. Molecular spintronics: the rise of spinterface science. Nature Phys. 6, 


562-564 (2010). 


. Raman, K. V. et al. Interface-engineered templates for molecular spin memory 


devices. Nature 493, 509-513 (2013). 


. Brede, J. et a/. Long-range magnetic coupling between nanoscale organic-metal 


hybrids mediated by a nanoskyrmion lattice. Nature Nanotechnol. 9, 1018-1023 
(2014). 


. Pai, W. W. et al. Optimal electron doping of a Ceég monolayer on Cu(111) via 


interface reconstruction. Phys. Rev. Lett. 104, 036103 (2010). 


. Xu, G. etal. Detailed low-energy electron diffraction analysis of the (4 x 4) surface 


structure of Cgg on Cu(111): seven-atom-vacancy reconstruction. Phys. Rev. B 86, 
075419 (2012). 


. Tamai, A. et al. Electronic structure at the Cgo/metal interface: an angle-resolved 


photoemission and first-principles study. Phys. Rev. B 77, 075134 (2008). 


. Cho,S. W. etal. Origin of charge transfer complex resulting in ohmic contact at the 


Cé0/Cu interface. Synth. Met. 157, 160-164 (2007). 


. Zhang, X. et al. Observation of a large spin-dependent transport length in organic 


spin valves at room temperature. Nature Commun. 4, 1392 (2013). 


LETTER 


24. Moorsom, T. et al. Spin-polarized electron transfer in ferromagnet/C¢o interfaces. 
Phys. Rev. B 90, 125311 (2014). 

25. Tseng, T.-C. eta/. Charge-transfer-induced structural rearrangements at both sides 
of organic/metal interfaces. Nature Chem. 2, 374-379 (2010). 

26. Janak, J. F. Uniform susceptibilities of metallic elements. Phys. Rev. B16, 255-262 
(1977). 

27. Morenzoni, E. et al. Implantation studies of keV positive muons in thin metallic 
layers. Nucl. Instrum. Methods B192, 254-266 (2002). 

28. Ansaldo, E. J., Niedermayer, C. & Stronach, C. E. Muonium in fullerite. Nature 353, 
121 (1991). 

29. Duty, T. L. et al. Zero-field SR in crystalline Ceo. Hyperfine Interact. 86, 789-795 
(1994). 

30. Coey, J. M. D. d° ferromagnetism. Solid State Sci. 7, 660-667 (2005). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements This work was supported by the Engineering and Physical 
Sciences Research Council through grants EP/KO0512X/1, EP/KO36408/1, EP/ 
JO1060X/1 and EP/I004483/1. Use of the N8 POLARIS (EPSRC EP/K000225/1), 
ARCHER (via the UKCP Consortium, EP/KO13610/1), and the High Performance 
Computing (HPC) Wales facilities is acknowledged. Use of the National Synchrotron 
Light Source, Brookhaven National Laboratory, was supported by the US Department 
of Energy, Office of Science, Office of Basic Energy Sciences, under contract number 
DE-AC02-98CH10886. 


Author Contributions F.A.M. and T.M. grew and characterized the samples, conducted 
the magnetometry and uSR, and contributed to the data analysis; G.T. performed and 
analysed the DFT simulations; W.D. grew and characterized the Cu-Cgg multilayers; 
T.P.,H.L, S.L.and M.F. contributed to the design, measurement and analysis of the uSR 
experiments; D.A.M. contributed to the TEM images and structural analysis; G.E.S. and 
D.AA. performed the X-ray magnetic circular dichroism and X-ray absorption 
spectroscopy measurements; M.A., M.C.W., G.B. and B.J.H. contributed to the sample 
structure and measurement setup; and O.C. designed the study, analysed the data and 
wrote the manuscript. All authors discussed the results and commented on the 
manuscript. 


Author Information The data presented here are available at http://dx.doi.org/ 
10.5518/6. Reprints and permissions information is available at www.nature.com/ 
reprints. The authors declare no competing financial interests. Readers are welcome to 
comment on the online version of the paper. Correspondence and requests for 
materials should be addressed to O.C. (o.cespedes@leeds.ac.uk). 


6 AUGUST 2015 | VOL 524 | NATURE | 73 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 


Magnetic measurements were taken using a superconducting quantum interfer- 
ence device operated as a vibrating sample magnetometer (SQUID-VSM), model 
MPMS3 from Quantum Design with resolution better than 10° emu. The thin 
films were deposited on 0.5-mm thick Si/SiO, substrates. Metals were deposited by 
DC magnetron sputtering at a pressure of approximately 2.5 mbar (24s.c.c.m. of 
Ar; 10” * mbar base pressure) with a deposition rate of 1-3 As. Cgo films were 
deposited by thermal evaporation from a sublimed, 99.9%-purity source in an 
AlL,0O; boat in the same chamber at approximately 107 * mbar and with deposition 
rates of 0.5-1 As. Al,O; films were grown via plasma oxidation of Al films: O2 
flow of 76 s.c.c.m., 35 mA current. Oxygen is highly detrimental to the emergent 
magnetism, and samples grown in a poor vacuum (Ptotal= 2X 10° or 
P,, >5 x 10~!°mbar) show no magnetization. Ta seed layers are used to decrease 
the sample roughness. Our thermally sublimed Cg films are relatively rough when 
compared to sputtered metallic films (about 1 nm r.m.s. roughness for Cg9 com- 
pared to <0.5nm in metals). The metallic films are continuous and there 
is negligible diffusion into the molecular film as seen in low-angle X-rays 
(Supplementary Fig. 3). Cross-sections of representative samples were analysed 
by transmission electron microscopy, which showed that the metallic layers are 
continuous and the Ceo layers are polycrystalline (Supplementary Fig. 27). The 
films experience a magnetic field strength of approximately 0.3kOe during 
growth, owing to an in-plane magnet and the field from the magnetron gun. 
They are also subject to field strengths of about 1-5 kOe during loading and 
centring in the SQUID-VSM, which is needed to position the sample with respect 
to the SQUID sensor. 

Low-energy muon spin spectroscopy” uses positive muons to provide a probe 
of local magnetization. Positive muons are implanted into a sample and decay into 
a detectable positron and a neutrino/anti-neutrino pair. Owing to charge-parity 
violation, there is a preferred direction of emission of the positrons along the 
muons’ spin vector. Determining the direction of the positron decay allows us 
to determine the precession of the muon spin and, therefore, the local field at the 
muon implantation site. A polarized high-intensity beam of energetic (MeV) 
positive muons” is obtained from the decay of 2*, generated by a proton 
beam impinging on a graphite target. After moderation in a cryogenic solid Ar 
moderator where the beam polarization is conserved’’, the anti-muons are 


re-accelerated electrostatically to keV energies and transported by electrostatic 
elements to the sample. Positrons emitted from muon decay are detected by 
two plastic scintillator rings and the difference between the fluxes observed at 
these two detectors is used to determine the instantaneous spin direction of 
implanted positive muons as a function of time. The muon asymmetry is then 
calculated using A(t) = [N.(t) — Ng(4)]/[NL@ + Na(t)], where Ny p(t) are the 
background-corrected decay histograms of the left and right positron detectors, 
respectively. The error of each bin count n is given by the standard deviation of n. 
The errors of each bin in A(t) are then calculated by standard error propagation. 

Standard, fixed spin-moment™ and non-collinear van der Waals corrected 
density functional theory simulations were done using the projected augmented 
wave method as implemented in the VASP program’’. We used the PBE exchange- 
correlation functional*®, a 400 eV plane-wave energy cut-off, (0.2 eV, first-order) 
Methfessel—Paxton electronic smearing”’, and a 10-symmetry-irreducible k-point 
grid for the single crystal 7-vacancy Cu(111) 4 X 4 reconstruction models. For the 
Cu{Cgo} and Ceo{Cu} 8 X 8 reconstruction models, only one k point was used. The 
adopted atomic-force threshold for geometry optimization was 0.02 eV A7'. For 
the 7-vacancy model we relaxed the five topmost Cu layers together with all the 
atoms of the Cg molecule. All the atoms of the Cu{Cgo} and Cgo{Cu} models were 
relaxed. In all cases, a vacuum separation of at least 12 A was present between 
replicated images of the interface models. 
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The prediction and synthesis of new crystal structures enable the 
targeted preparation of materials with desired properties. Among 
porous solids, this has been achieved for metal-organic frame- 
works’ *, but not for the more widely applicable zeolites*’, where 
new materials are usually discovered using exploratory synthesis. 
Although millions of hypothetical zeolite structures have been 
proposed®’, not enough is known about their synthesis mechanism 
to allow any given structure to be prepared. Here we present an 
approach that combines structure solution with structure predic- 
tion, and inspires the targeted synthesis of new super-complex 
zeolites. We used electron diffraction to identify a family of related 
structures and to discover the structural ‘coding’ within them. This 
allowed us to determine the complex, and previously unknown, 
structure of zeolite ZSM-25 (ref. 8), which has the largest unit-cell 
volume of all known zeolites (91,554 cubic angstréms) and demon- 
strates selective CO, adsorption. By extending our method, we 
were able to predict other members of a family of increasingly 
complex, but structurally related, zeolites and to synthesize two 
more-complex zeolites in the family, PST-20 and PST-25, with 
much larger cell volumes (166,988 and 275,178 cubic angstr6ms, 
respectively) and similar selective adsorption properties. Members 
of this family have the same symmetry, but an expanding unit cell, 
and are related by hitherto unrecognized structural principles; we 
call these family members embedded isoreticular zeolite structures. 

New porous materials with designed structures and properties have 
been synthesized from metal-organic frameworks (MOFs)' by assem- 
bling inorganic and organic building units of defined geometry to give 
frameworks with predictable topology and functionality**. This degree 
of control is difficult to achieve for purely inorganic frameworks*”. 
Geometrically related structures have been prepared, for example 
using enlarged clusters”’® or extended inorganic chains as building 
units'!, but the former requires major changes in framework chemistry 
and synthesis conditions and the latter uses organic templates that 
cannot be removed without structural collapse. For the most industri- 
ally important class of microporous materials, zeolites, which have 
fully connected frameworks of corner-sharing AlO, and SiO, tetra- 
hedra, there are no examples where new structures have been designed 
and then directly prepared. Millions of energetically feasible hypothet- 
ical zeolite ‘structures’ have been predicted®’, but routes to their syn- 
thesis remain elusive. 

Even when new zeolites are prepared, through exploratory syn- 
thesis, their structure solution takes time because they crystallize 
as powders. Nevertheless, complex zeolite structures can be solved, 
usually with help from an electron microscope’. In one approach, 
powder X-ray diffraction (PXRD) intensity data are combined with 
structure factor phase information obtained from high-resolution 
transmission electron microscopy (HRTEM) images'*"*; in another, 
rotation electron diffraction (RED)'””’ is applied to crystals less than a 


micrometre in size'””’. Here, we used electron diffraction to solve 
a complex, unknown zeolite structure related to paulingite; using 
the ‘strong reflections’ method”, we discovered that this unknown 
zeolite structure and paulingite have the same structural ‘coding’. 
We extended this method to predict a family of highly complex 
zeolite frameworks with unit-cell volumes in excess of any 
previously reported volumes, and prepared two of them via rational 
synthesis. 

ZSM-25, first reported in 1981°, was synthesized according to meth- 
ods in the literature using Na* and tetraethylammonium (TEA ) ions 
as structure directing agents (SDAs)”’, as part of our search for select- 
ive adsorbents (see Methods). It showed good CO, adsorption prop- 
erties (described below), but its structure was not known. We therefore 
applied the RED method to ZSM-25 (NaTEA-ZSM-25) microcrystals 
(Fig. 1a, b, Methods, Extended Data Fig. 2a—c). The three-dimensional 
(3D) RED data revealed that ZSM-25 is body-centred cubic (unit-cell 
edge length a = 42.3 A) with Laue symmetry m3m. However, electron 
beam damage causes low data resolution and prevents structure solu- 
tion using direct methods. The IZA Database of Zeolite Structures** 
lists three frameworks with the same Laue symmetry as ZSM-25: KFI 
(ZK-5), RHO (Rho) and PAU (paulingite) all have the same space 
group Im3m. Further, we found that the strong reflections of ZSM-25 
are distributed in the same regions of reciprocal space as those calcu- 
lated for RHO and PAU (Fig. la—d, Extended Data Fig. 3), indicating 
the RHO, PAU and ZSM-25 structures are related. Strong reflections 
represent the main structural features of a crystal and can be used for 
structure solution”’. We therefore thought that it might be possible to 
phase the strong reflections of ZSM-25 from the known PAU struc- 
ture, and thus solve its structure. The 21 strongest symmetry-inde- 
pendent reflections were identified, and their phases assigned to be 
those calculated from corresponding reflections of the PAU structure 
(Fig. 1c, d, Extended Data Table 1). All 16 symmetry-independent T 
atoms (T = Si, Al) were located from the 3D electron-density map 
using the 21 reflections: oxygen atoms were placed between the T 
atoms according to TO, tetrahedral geometry. The structure of as- 
made NaTEA-ZSM-25, including its aluminosilicate framework and 
extra-framework cation and water positions, was refined against syn- 
chrotron PXRD data (Fig. 2a, Methods). 

The ZSM-25 framework can be considered an expanded version of 
PAU. Both are built of seven different cage types’; these have the face 
symbols and three-letter codes [4'7688°] (ita), [4587] (d8r), [4'°8°] 
(pau), [4°678°] (t-plg), [4°8°] (t-oto), [4°8*] (t-gsm) and [4’8°] (t-phi); 
Fig. 2b. Face symbols give the number of faces per cage with the 
specified number of sides: for example, [4°87] denotes cages with eight 
four-sided faces and two octagonal faces. The maximum ring size in 
each is eight, which establishes them as small-pore zeolites. The Ita 
cages are connected via chains of alternating d8r and pau cages along 
unit-cell edges to form cubic scaffolds (Fig. 2c, d). The scaffold of 
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Figure 1 | Structure determination of ZSM-25 
using the strong reflections approach. a, b, The 
2D slice of (hk0) (a) and (hkh +k) (b) cut from 
the reconstructed 3D reciprocal lattice from 

the RED data. The symmetry m3m has been 
superimposed to allow for a better comparison. 
c, d, Simulated (hk) (c) and (hkh+k) 

(d) diffraction patterns of the idealized 

PAU structure, with the structure factor phases 
marked in blue (180°) and red (0°). e, 3D map 
generated by using amplitudes obtained from 
RED of ZSM-25 and phases calculated from the 
structure of PAU. f, The framework structure of 
ZSM-25. 


Figure 2 | PXRD profiles and description of 
the structure of ZSM-25. a, Rietveld refinement of 
as-made NaTEA-ZSM-25 (X-ray wavelength 

A = 0.63248 A). The inset intensities are scaled by 
a factor of six. b, The seven different cages, 
[4'76°8°] (Ita), [4°87] (d8r), [4'78°] (pau), [4°6°8°] 
(t-plg), [4°8°] (t-oto), [4°8*] (t-gsm) and [4’8°] 
(t-phi), found in ZSM-25, as solid tiles. c, d, The 
connectivity of the Ita, d8r and pau cages in 

PAU (c) and ZSM-25 (d), showing the 
interpenetration of the two cubic scaffolds. The 
sequence is Ita-d8r-pau-d8r-pau-d8r-lIta for 
PAU and Ita-d8r—pau-d8r-pau-d8r—pau-d8r-lta 
for ZSM-25. e, f, The 3D framework structure of 
PAU (e) and ZSM-25 (f) with t-plg, t-oto, t-gsm and 
t-phi cages embedded in the scaffolds. 
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ZSM-25 is extended from that of PAU by adding an extra pair of 
pau and d8r cages along each unit-cell edge, which expands a by 
approximately 10 A. In accordance with the body-centred structure, 
each structure contains two such cubic scaffolds, interpenetrated. The 
space between the scaffolds is filled by the four other types of cages to 
form fully four-connected frameworks (Fig. 2e, f). All cages are inter- 
connected via 8-ring windows. The structure of RHO can be obtained 
by removing two pairs of pau and d8r cages from each unit-cell edge, 
leaving only one d8r cage between the Ita cages (Extended Data Fig. 6). 
RHO, PAU and ZSM-25 belong to the same family. PAU and ZSM-25 
can be considered expanded versions of RHO. We call this the RHO 
family, and denote Rho to be the first generation (RHO-G1), paulin- 
gite the third (RHO-G3) and ZSM-25 the fourth (RHO-G4). It is 
interesting to predict the structure of other family members. While 
the structure of RHO-G2 with two d8r and one pau cages per unit-cell 
edge (a ~ 25 A) was generated previously”*”’, it is much more chal- 
lenging to predict larger structures by modelling how the large space 
between the cubic scaffolds should be filled. 

We anticipated that the structure relationship (structural “coding’) 
of the higher members of the RHO family would also be reflected in 
reciprocal space, and that this could be exploited for structure predic- 
tion. We found that the structure factors of the strong reflections for 
ZSM-25 and PAU are indeed very similar (Fig. 3a, Extended Data 


c 


4, 
Na 


bmi Oe tlh) 


Fig. 3). The framework of ZSM-25 could be predicted solely from 
the related PAU framework, without using any experimental diffrac- 
tion data from ZSM-25 (Methods). We applied the same approach to 
predict the structures of other members: RHO-G2 from PAU (RHO- 
G3), RHO-G5 from ZSM-25 (RHO-G4) and RHO-G6 from RHO-G5 
(Fig. 3b, c, Methods, Extended Data Fig. 5). The final energies per 
SiO, unit as a function of framework density for RHO-G1 to RHO- 
G6 are consistent with the trends observed for known structures 
(Supplementary Fig. 2, Supplementary Table 12), indicating that they 
are all energetically feasible. In principle, the number of members in 
the RHO family is limitless. New zeolites with ever-greater unit-cell 
volume and complexity are achieved by adding new pairs of d8r and 
pau cages, and their structures can be predicted using a similar 
approach. 

Except for RHO-G1 and RHO-G2, all other members consist of the 
same seven cages (Fig. 3c, Extended Data Fig. 6, Supplementary Tables 
13, 14) and every T atom is part of three 4-rings. We think that these 
common motifs arise as a consequence of a dominating aluminosili- 
cate crystallization pathway. That both ECR-18 (PAU) and ZSM-25 
were synthesized using TEA* and Na” as SDAs, together with K* in 
the case of ECR-18, led us to speculate that the larger members (for 
example, RHO-G5 and RHO-G6) of this family could also be synthe- 
sized using these SDAs, in concert with other inorganic cations. 


Figure 3 | Comparison of the reflection 
distributions and framework structures of RHO- 
G3 to RHO-G6. a, The (hk0) reciprocal plane 
showing the similar amplitude and phase 
distribution of the strong reflections of RHO-G3 to 
RHO-G6. Reflections in red have phases of 0°, 
while those in blue have phases of 180°. The red, 
green and blue circles correspond to d-spacings 
of 1.0 A, 1.6 A and 3.0 A, respectively. b, c, 
Polyhedral (b) and tiling (c) representations of 
cross-sections (about 12 A thick) of RHO-G3 to 
RHO-G6. The crystals corresponding to RHO-G3 
to RHO-G6 were synthesized as ECR-18, ZSM- 
25, PST-20 and PST-25, respectively. The 
arrangement in the centre alternates every second 
structure in (c), that is, it is similar for RHO-G3 
and RHO-G5, and RHO-G4 and RHO-G6. 


44.22 A 


54.07A 


RHO-G6 
(PST-25) 
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Examination of the evolution of the numbers of different cages in 
the RHO family showed that the numbers of t-oto, t-gsm and t-phi 
cages grow much faster than those of the other four cage types 
(Supplementary Table 13). Furthermore, we were aware that the nat- 
ural zeolites gismondine (GIS) and phillipsite (PHI), which contain 
t-gsm cages only (GIS) and t-oto and t-phi cages (PHI) as building 
units, possess substantial amounts of alkaline-earth cations such as 
Ca** and even Ba** as extra-framework cations”. This prompted 
us to introduce small amounts of different alkaline-earth cations to 
the ZSM-25 synthesis mixture to promote the preferential formation 
of t-oto, t-gsm and t-phi cages and thus to favour crystallization of the 
more-complex members of the RHO family. 

Following the strategy described above, we were able to synthesize 
the hypothetical RHO-G5 phase, denoted PST-20 (Methods, Extended 
Data Fig. 1b, Supplementary Tables 1, 2). Its successful synthesis was 
confirmed by RED (Extended Data Fig. 2d-f) and Rietveld refinement 
(Extended Data Fig. 4). Although the crystallization of PST-20 was 
sensitive to synthesis temperature and time, the presence of the alkal- 
ine-earth cations Ca”* and, in particular, Sr?” is required to direct its 
crystallization. A pure sample of PST-20 was successfully prepared by 
addition of Sr** to the synthesis gel. Subsequent structural analysis 
revealed that the Sr** cations are located mainly within the 8-rings of 
its t-oto, t-gsm and t-phi cages (Supplementary Fig. 6), validating our 
approach. Following the same rational approach, modification of the 
synthesis conditions by the addition of both Sr** and Ca’* to the gel 
composition that gives ZSM-25 resulted, after hydrothermal treat- 
ment, in products that contained crystals of RHO-G6, the next, even 
more complex zeolite in the RHO family (Supplementary Fig. 9, 
Supplementary Table 3). Further work is in progress to obtain the pure 
form of this material, which we denote PST-25. 

As with all members of this family, ZSM-25 and PST-20 (and PST- 
25) are accessible to molecules that can pass through 8-rings, and so 
they are potentially useful as small-molecule adsorbents. Removal 
of CO, from natural gas or from flue gases” is one area of current 
interest for small-pore zeolites. We found that NaTEA-ZSM-25 and 
Na*-exchanged NaSrTEA-PST-20 (denoted NaTEA-PST-20) show 
similarly high uptakes of CO, and low uptakes of N2 and CH, 
(Fig. 4, Methods, Extended Data Table 2). The CO2/CH, selectivity 
for all members of the RHO family is high, and much greater than 
that exhibited by the K-chabazite that we examined (Extended Data 
Table 2). We attribute this to the effect of cation gating, where cations 
blocking 8-ring windows in the structures are able to move to allow the 
passage of gas molecules that strongly interact with them, such as CO,, 
but remain in place in the presence of weakly interacting molecules”. 
Moreover, the CO, uptakes remained the same over 100 adsorption- 
desorption cycles (insets of Fig. 4a, b). The CO uptake at 1.0 bar and 
298 K was 3.5mmolg | for NaTEA-ZSM-25 and 3.2mmolg ' for 
NaTEA-PST-20. These CO, uptakes are somewhat lower than that 
of Na-Rho (4.5 mmol gat 1.0 bar and 298 K), but they are compar- 
able with those observed for other well studied small-pore zeolites such 
as K-chabazite (CHA, 3.6mmolg '). More notably, although CO, 
adsorption on Na-Rho reached equilibrium only after about 2h, 
uptake on NaTEA-ECR-18 was faster (equilibrating in 5 min), and 
NaTEA-ZSM-25 and NaTEA-PST-20 achieved equilibrium more 
quickly still (after about 2 min) (Fig. 4c). Given their selective adsorp- 
tion, fast kinetics and long-term stability, NaTEA-ZSM-25 and 
NaTEA-PST-20 are of potential interest as CO, adsorbents. 

Structure expansion in the RHO family operates at two levels 
(Fig. 3c, Extended Data Fig. 6, Supplementary Figs 4c, 5c). First, the 
twofold interpenetrated scaffold is expanded by inserting pau and d8r 
cages along each unit-cell edge. Second, the space between the scaffolds 
is filled by four other cage types to form rigid, fully four-connected 
frameworks. The former expansion is isoreticular, as seen in MOFs'*, 
whereas the latter filling occurs by embedding four different cages in 
the inter-scaffold space. We call frameworks resulting from this prin- 
ciple of structure expansion ‘embedded isoreticular’; the RHO family is 
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Figure 4 | Gas adsorption properties of NaTEA-ZSM-25 and NaTEA- 
PST-20. a, b, Adsorption isotherms at 298 K of CO, (navy), CH4 (green) 
and Nj (pink) for NaTEA-ZSM-25 (a) and NaTEA-PST-20 (b). Inset, COz 
adsorption-desorption cycles at 343 K. c, CO2 adsorption kinetics at 298 K and 
1.2 bar of NaTEA-ZSM-25 (violet), NaTEA-PST-20 (orange), NaTEA-ECR-18 
(navy), Na-Rho (pink) and K-chabazite (green). Inset, zoom of the CO, 
adsorption kinetics for the first 5 min. 


the first example. Although other families of expanded structures 
have the same topology and enlarged pore sizes**"' (Supplementary 
Fig. 4a, b), the RHO-family members have different topologies but 
similar pore sizes (Supplementary Fig. 5b, c, Supplementary Tables 
10, 11). The structural relationships among the RHO-family members 
become clear in reciprocal space, through the similar amplitude and 
phase distribution of reflections. This structural ‘coding’ is useful both 
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for structure solution and for prediction of new family members. It has 
enabled the syntheses of new zeolites with huge unit cells from chem- 
ically relatively simple systems—ZSM-25, PST-20 and PST-25 are the 
largest zeolites so far by unit-cell volume—and it suggests a route to the 
rational synthesis of certain classes of zeolites. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Zeolite syntheses. ZSM-25 was synthesized from aluminosilicate gels with a very 
narrow range of SiO2/Al,03 and Na2O/SiO) ratios in the presence of TEABr, as 
reported by several groups****'. In a typical synthesis of ZSM-25, 1.92 g of 
Al(OH)3°1.0H20 were first mixed with a solution of 3.04g of NaOH solution 
(50%, Aldrich) in 60.73g of distilled water. To the resulting clear solution, 
10.80 g of Ludox AS-40 (DuPont) and 11.15g of TEABr (98%, Aldrich) were 
added. The resulting gel composition was 1.9Na,O-1.0Al,03°5.2TEABr- 
7.2SiO2°390H,0. The final synthesis mixture was stirred at room temperature 
for one day, charged into Teflon-lined 23-ml autoclaves and heated at 408 K under 
rotation (60 r.p.m.) for 7 days. 

PST-20 was synthesized using the organic SDA, TEA”, together with two 
inorganic SDAs, Na‘ and Sr?* cations. In a typical synthesis of PST-20, 1.92 g 
of Al(OH)3-1.0H2O were first mixed with a solution of 3.04g of 50% NaOH 
solution in 60.73 g of distilled water. To the resulting clear solution, 10.80 g of 
Ludox AS-40, 1.07 g of Sr(NO3)2 (Aldrich) and 11.15 g of TEABr were added. 
The resulting gel composition was 1.9Na,O-0.5SrO-1.0Al,03°5.2TEABr-7.2SiO)° 
390H,0. If required, seed crystals (2 wt% of anhydrous raw materials) were added 
to this gel. The seed crystals used here were PST-20 zeolite containing a small 
amount of ZSM-25 (<20%, according to PXRD analysis), which was previously 
prepared at 418K for 4 days. The final synthesis mixture was stirred at room 
temperature for one day, charged into Teflon-lined 23-ml autoclaves, and heated 
at 418 K under rotation (60 r.p.m.) for 2 days. Further details of PST-20 synthesis 
are given in Supplementary Tables 1 and 2. 

The solid products were recovered by filtration, washed repeatedly with water, 

and then dried overnight at room temperature. As-made ZSM-25 and PST-20 
samples were characterized by PXRD, and *7Al and *°Si solid-state magic-angle 
spinning NMR (Extended Data Fig. 1). The samples were calcined at 773 K in air 
for 8h. PXRD patterns show that ZSM-25 retained its crystallinity but PST-20 lost 
crystallinity upon calcination. As-made PST-20 (NaSrTEA-PST-20) was refluxed 
twice in 1.0 M NaNO; solution at 353 K for 6 h (2.0 g solid per 100 ml solution) to 
ensure that it was in its Na‘-TEA* form (denoted NaTEA-PST-20). For com- 
parison, ECR-18 (PAU), zeolite Rho (RHO), and chabazite (CHA) with similar 
Si/Al ratios were also synthesized according to the procedures reported in the 
literature°°?> and converted to their Na~ or K* forms. 
Collection of rotation electron diffraction (RED) data. For RED data collection, 
powders of as-made NaTEA-ZSM-25 and NaSrTEA-PST-20 samples were dis- 
persed in absolute ethanol and treated by ultrasonic treatment for 2 min. A droplet 
of the suspension was transferred onto a carbon-coated copper grid and dried in 
air. The 3D RED data were collected on a JEOL JEM2100 TEM at 200 kV using the 
RED-data collection software’’. A single-tilt tomography sample holder was used 
for the data collection. The electron diffraction frames were recorded on a 12-bit 
Gatan ES500W Erlangshen camera side-mounted at a 35 mm port. For NaTEA- 
ZSM-25, the tilt step was 0.10° and the exposure time was 3.0s per electron 
diffraction frame. The tilt range was 76.71° and the total data collection time 
was about 70 min. Because NaSrTEA-PST-20 was more electron-beam sensitive 
than NaTEA-ZSM-25, a shorter data collection time (17 min) was used, with a 
larger tilt step (0.20°), a shorter exposure time (1.0 s per electron diffraction frame) 
and a tilt range of 49.98° (Supplementary Table 4). 

The data processing was performed using the software RED-data processing". 
The unit cell was determined from the positions of the diffraction spots detected in 
the electron diffraction frames. The RED data show that both NaTEA-ZSM-25 
and NaSrTEA-PST-20 are body-centred cubic with the Laue symmetry of m3m 
(Extended Data Fig. 2). The unit-cell parameter determined from the RED data 
was a=42.3A for NaTEA-ZSM-25 and a=52.4A for NaSrTEA-PST-20 
(Supplementary Table 4). The reflection conditions were deduced from the recon- 
structed reciprocal lattice to be hkl: h + k + 1= 2n, hkO: h + k = 2n, hkh: k = 2n, 
O0/: 1 = 2n. From the Laue symmetry and reflection conditions, the possible space 
groups are 1432 (No. 211), 143m (No. 217), and Im3m (No. 229). The intensity for 
each reflection was extracted from the electron diffraction frame with the highest 
intensity value. The final list of reflections with the indices and intensity was 
output to an HKI file for SHELX™*. 

Structure determination of ZSM-25. Three zeolite frameworks were identified 
that have the same Laue group as ZSM-25: ZK-5 (KFI, a = 18.75 A), zeolite Rho 
(RHO, a= 15.03 A) and paulingite*® (PAU, a = 35.09 A). The crystallographic 
structure factors were calculated from the atomic coordinates of the idealized 
framework given in the Database of Zeolite Structures”. It was found that the 
strong reflections of ZSM-25 are distributed in the same locations in reciprocal 
space as those calculated from PAU (Fig. 2a—d). Twenty-one symmetry-independ- 
ent reflections up to 2.5A resolution with amplitudes larger than 30% of the 
strongest reflection were identified from the RED data, and their phases were 
assigned to be those of structure factor phases calculated from corresponding 
reflections of the PAU structure (Fig. 2c, d, Extended Data Table 1). The 
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indices of the required corresponding reflections in the PAU structure were 
obtained by scaling according to the unit cells: hpay = hzgu-25 X Apau/azsm-255 
kpau = kzsm-2s X @pau/azsm-25> pau = Izsm-25 X @pau/azsm-25 (Extended Data 
Table 1). The 3D electron-density map was calculated by inverse Fourier trans- 
formation from the amplitudes and phases of these strong reflections using 
the SUPERFLIP software (Fig. 2e)**. All 16 symmetry-independent T atoms 
(T = Si, Al) were located from the 3D electron-density map using the software 
EDMA”. The oxygen atoms were placed between the T atoms according to SiO4 
tetrahedral geometry. The final model is a four-connected 3D framework (Fig. 26), 
which was geometrically optimized using TOPAS Academic 4.1°°. Every T atom 
is part of three 4-rings (in two different chains of 4-rings), accounting for the 
characteristic infrared and Raman spectra that have been reported previously for 
ZSM-25”°. 

Rietveld refinement (ZSM-25 and PST-20) and profile fitting (PST-25). High- 
resolution PXRD data of as-made NaTEA-ZSM-25 were collected at room tem- 
perature at experimental station ID31 at the ESRF, Grenoble (X-ray wavelength 
2. = 0.632480 A). PXRD data of calcined NaTEA-ZSM-25 (Extended Data Fig. la) 
was obtained in flat plate mode using a PANalytical X’Pert PRO diffractometer 
(2 = 1.5418 A). High-resolution PXRD data of as-made and Na*-exchanged 
NaSrTEA-PST-20 were collected at 100K at experimental station ID22 at the 
ESRF, Grenoble (A = 0.40091 A). The samples were sealed in glass capillaries of 
0.7mm diameter. Rietveld refinement was performed using TOPAS Academic 
V4.1**. High-resolution PXRD data of a sample with a mixture of PST-25 and PST- 
20 (Run 18, Supplementary Table 3) were collected in flat-plate mode on the 9B 
beamline at the Pohang Acceleration Laboratory, South Korea (A = 1.4640 A). 
Profile fitting was performed in the 20 range of 10°-70° by the LeBail method” 
using the GSAS suite of programs”. 

For NaTEA-ZSM-25, the background was fitted with a 16th-order Chebychev 
polynomial. The refinement was conducted using a PearsonVII peak profile func- 
tion, followed by refinement of unit cell (a = 45.0711(3) A) and zero-shift. The 
chemical formula was deduced from EDS, TGA and CHN analyses to be 
|(N(C2Hs)4)40Naogs(H2O)<00| [Sir115Al325O2gg0]- The organic TEA* cations were 
suggested by molecular modelling to be located in the pau and t-plg cages 
(Supplementary Fig. 1 and Supplementary Table 5), and Na* and water molecule 
positions were arrived at by comparison with the structure of as-made paulingite*’ 
and by difference Fourier analysis. Considering the ratio of Si/Al = 3.4, soft 
restraints were placed on the T-O distances (1.64 A, T=Si, Al) and O::-O 
distances (2.68 A) within the TO, tetrahedra. All T positions were refined with 
the same, and fixed, occupancies. Additional Na’ cations and guest water mole- 
cules were located from the difference Fourier maps by fixing the framework of 
ZSM-25. All atomic positions were refined in the final cycles. The Debye-Waller 
factors of T, O, C and N atoms were fixed to 0.8, 1, 10 and 10, respectively, while 
those of Na* and water molecules were refined. The final refinement converged to 
a weighted -profile-fit R factor Ry, = 0.0537, a profile-fit R factor R, = 0.0414 and 
goodness of fit GOF = 2.87 (Fig. 3a, Supplementary Table 6). 

There are 16 symmetry-independent T-atom positions, 40 oxygen-atom posi- 
tions, four TEA* locations, 13 Na*-cation positions and 24 water-molecule loca- 
tions in NaTEA-ZSM-25. Most Na” cations are located in the 8-rings of the t-oto, 
d8r and t-gsm cages; some of them are partially occupied and sometimes share the 
same positions with guest water molecules. A Na* cation (Nal2, occupancy of 
0.51) is found at the 6-ring connecting the /ta and t-plg cages. There are about 296 
Na? cations in one unit cell, which is consistent with the chemical analysis. The 
TEA” cations are disordered in the pau and t-plg cages. The final refinement 
shows that there is one TEA* in each pau cage, and 0.85 and 0.80 TEA® in 
t-plg cages (there are two symmetry-independent t-plg cages). The t-plg cages 
contain both TEA cations and guest water molecules, with a total occupancy 
of 1.0. The final framework structure has reasonable T-O bond distances 
(1.64 + 0.02 A), O-T-O angles (109.3 + 4.5°) and T-O-T angles (132-159°). 

Rietveld refinement of the calcined, hydrated ZSM-25 was carried out in a 
similar way to that of NaTEA-ZSM-25, with the obvious difference that no 
TEA’ cations remain in the solid (a = 44.9242(16) A, Supplementary Fig. 7 and 
Supplementary Table 7). 

For as-made NaSrTEA-PST-20, the unit-cell formula derived by elemental 
analysis and structure refinement was |(N(C:Hs)4)s6Na162St210(H2O)s6s| 
[Al¢3gSizo02Os2g0]. The starting structure was based on the model of RHO-G5 
established during the prediction of larger structures of the RHO family. The 
background was fitted with a 30th-order Chebychev polynomial. The refinement 
was conducted using a TCHZ peak profile function, followed by refinement of unit 
cell (a = 55.0437(16) A) and zero-shift. Soft restraints were placed on the T-O 
distances (1.64A, T = Si, Al) and O---O distances (2.68 A) within the TO, tet- 
rahedra. All T positions were refined with the same, and fixed, occupancies. The 
location of TEA* cations was modelled using the positions obtained from the 
structural model of NaTEA-ZSM-25, where TEA“ cations were in the pau and 
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t-plg cages. The Na*/Sr* cations were either allocated from difference Fourier 
maps or placed in similar sites as those in NaTEA-ZSM-25. These cations were 
mostly in the 8-ring sites throughout the structure. When the fractional occupan- 
cies of cations refined to values considerably greater than 1 when input as ‘Na*’, 
they were instead included as more strongly scattering Sr’* cations and their 
occupancies were refined without any restrictions. Each Na” or Sr** site was then 
modelled with a mixed occupancy with water oxygen. In this way six sites were 
identified as unambiguously containing Sr** cations (Sr1-Sr6). Additional scat- 
tering identified from difference Fourier mapping was included as water oxygen. 
The Debye-Waller factors of T, O, Na/Sr, water molecules and C(N) in the TEA* 
ions were fixed to 1, 1.5, 3, 4 and 5, respectively and all fractional atomic coordi- 
nates were refined in the final cycles. The refinement converged to Ry» = 0.0791, 
R, = 0.0569 and GOF = 4.396 (Extended Data Fig. 4a, Supplementary Table 8). 

For NaTEA-PST-20, the chemical formula obtained from the elemental 
analysis and structure refinement was |[(N(C2Hs)4]s6Nase0St11(H2O)ssel 
[Alg3gSiz002Os280]; a small amount of Sr** cations still remained after the ion 
exchange. Rietveld refinement was carried out in a similar way to that of as-made 
NaSrTEA-PST-20 and the refined unit cell was a = 55.0664(7) A. The locations of 
TEA* and Na” cations were modelled using the positions obtained from the 
structure model of NaTEA-ZSM-25. Additional guest water molecules were 
located from the difference Fourier maps by fixing the framework of NaTEA- 
PST-20. The Debye-Waller factors of T, O, Na/Sr, water molecules and C(N) in 
the TEA” ions were fixed to 1, 2,3, 4and5, respectively and all fractional atomic 
coordinates refined in the final cycles. The refinement converged to R,, = 0.0883, 
R, = 0.0653 and GOF = 3.59 (Extended Data Fig. 4b, Supplementary Table 9). 

During the refinement of NaTEA-PST-20, some unindexed peaks were iden- 
tified, which could be attributed to the minor impurity phases ZSM-25 and (for 
some smaller peaks) the even larger RHO-family member RHO-G6. Thus, the 
structure models of NaTEA-PST-20 and NaTEA-ZSM-25 were both included in 
the refinement. Considering the complexity of the two structures and the number 
of parameters, only the TCHZ peak profile function, the zero-shift, the back- 
ground with a 17th-order Chebychev polynomial, and the unit cells of the two 
structures were refined. The atomic positions and thermal parameters were 
fixed on the basis of the two structure models. The refinement was improved 
and converged to Ryp = 0.0793, R, = 0.0593 and GOF = 3.16 (Supplementary 
Fig. 8, Supplementary Table 9), with 92.5 wt% of PST-20 and 7.5 wt% of ZSM- 
25 in the sample. 

The synchrotron PXRD pattern of the sample from Run 18 (Supplementary 
Table 3 and Supplementary Fig. 9a) was compared to those calculated on the basis 
of the structure models of PST-20 and the hypothetical RHO-G6, which indicated 
that the sample is a mixture of RHO-G6 (denoted PST-25) and PST-20, with about 
75% PST-25. The two-phase LeBail refinement based on PST-25 and PST-20 
resulted in a good agreement between the observed and the calculated profiles 
(Supplementary Fig. 9b; Rwp = 0.0221, Ry = 0.0142), and the unit-cell parameters, 
a = 55.0270(5) A for PST-20 and a = 65.0436(4) A for PST-25. 

Prediction of ZSM-25 from PAU on the basis of strong reflections. Inspired by 
the successful structure solution of ZSM-25 by phasing the RED data using the 
related PAU structure, we investigated the possibility of deducing the structure of 
ZSM-25 solely from the PAU structure. We compared the structure factors cal- 
culated from the frameworks of PAU and ZSM-25, and found that the intensity 
distribution of reflections is similar and the phases of the strong reflections are 
the same, as shown in Fig. 3a and Extended Data Fig. 3. We selected the 133 
strongest symmetry-independent reflections of PAU with normalized structure 
factor E>1.2 and d-spacing d>1.00A to predict the structure of ZSM-25 
(Supplementary Table 15). The structure factor amplitudes and phases of these 
strong reflections were transposed to be those of a ‘hypothetical’ ZSM-25 by 
converting the reflection indices according to hzsm-25 = hpau X azsm-25/dpaus 
kzsm-25 = kpau X @zsm-25/Apau !zsm-25 = Ipau X azsm-25/@pau and taking the 
nearest integers. A 3D electron-density map was calculated (Extended Data Fig. 
5d), and all 16 T atoms and 31 out of 40 oxygen atoms in the asymmetric unit were 
located. A complete ZSM-25 framework could be obtained by adding the nine 
missing oxygen atoms geometrically between the T atoms (Extended Data Fig. 5e). 
Compared to the 3D electrostatic-potential map obtained from RED (Fig. 2e), the 
3D electron-density map deduced from PAU (Extended Data Fig. 5d) has higher 
resolution so that most of the oxygen atoms could be resolved from the density 
map. This showed that the framework structure of ZSM-25 could be predicted 
solely from the related PAU framework, without using any ZSM-25 experimental 
diffraction data. 

Prediction of new structures in the RHO family. The structure of RHO-G2 
(Extended Data Fig. 5b) (a ~ 25 A) was predicted previously**”’. The prediction 


of larger structures, for example RHO-G5 (a ~ 55 A), RHO-G6 (a ~ 65 A) and so 
on, is very challenging. Although we know the unit cell, space group and partial 
structures (the cubic scaffolds) of RHO-G5 and RHO-G6, it is difficult to fill the 
remaining empty space between the cubic scaffolds by model building to complete 
these two structure models manually. We therefore used the strong reflections 
method we developed above to predict the structure model of RHO-G5 from 
RHO-G4, and the structure model of RHO-G6 from RHO-GS. Structure factor 
amplitudes and phases were calculated from the idealized structure model of 
RHO-G4 (ZSM-25). The 470 strongest reflections with E> 1.2 and d> 1.00A 
were selected (Supplementary Table 16). The indices of each strong reflection of 
RHO-G5 were calculated from the indices of the corresponding reflection 
of RHO-G4 according to hpyyo-cs = Mruo-ca X 4ruo-cs/4guo.ca kKruo-cs = 
krvo-Ga X 4ru0-cs/4rH0-64 Iru0-cs = IrHo-c4 X @ruo-cs/4xno-c4- The 3D elec- 
tron-density map was calculated by inverse Fourier transformation from the 
amplitudes and phases adopted from those of RHO-G4 using the SUPERFLIP 
software’ (Extended Data Fig. 5f). All 29 T atoms and 44 out of 70 oxygen atoms 
in the asymmetric unit of RHO-G5 were located from the 3D map by using the 
EDMA software”, and the remaining 26 oxygen atoms were added geometrically 
between the T atoms to complete the four-connected framework (Extended Data 
Fig. 5g). A similar approach was applied to generate the RHO-G6 structure 
based on the RHO-GS5 structure. The 3D electron-density map (Extended Data 
Fig. 5h) was calculated using the 742 strongest reflections with E> 1.2 
and d>1.00A from the RHO-GS structure, as given in Supplementary 
Table 17. The calculation of the indices of RHO-G6 follows the previous rules, 
hyyo-ce = hrno-cs X @gno-c6/@ruHo-Gs kruo-ce = kruo-cs X 4rvo-ce/Arvo-Gs 
Iay0-ce = Iruo-cs X 4guo-ce/@guo-cs, All 47 T atoms and 96 out of 112 oxygen 
atoms of RHO-G6 in the asymmetric unit were located from the 3D map, and the 
remaining 16 oxygen atoms were added geometrically between the T atoms to 
complete the four-connected framework (Extended Data Fig. 5i). All the structures 
in the RHO family were further energy-minimized in the pure SiO, forms using 
GULP (Supplementary Fig. 2, Supplementary Table 12), and are all energetically 
feasible. The corresponding unit-cell parameters for RHO-G1 to RHO-G6 are 
14.77 A, 24.58 A, 34.40 A, 44.22 A, 54.07 A and 63.87 A, respectively. The energy 
difference from quartz was as predicted, on the basis of the results of earlier studies 
using GULP that show a clear trend between energy and framework density’. The 
lattice energies for the RHO family are comparable with those for other zeolite 
structures built from 4- and 6-rings only, for example SOD, LTA, FAU, MER, 
FAU, KFI, CHA, PHI. This indicates that all structures in the RHO family are 
energetically reasonable. 

Gas adsorption experiments. The CO, CH4, and N2 adsorption isotherms of 
NaTEA-ZSM-25 and NaTEA-PST-20 were measured at 298 K and at pressures up 
to 1.2 bar using a Mirae SI nanoPorosity-XG analyser (Fig. 4a, b). Prior to the 
experiments, each zeolite sample was evacuated for 6 h at 523K. Adsorption 
kinetics and adsorption-desorption cycling of CO. were performed using a 
Setaram PCTPro E&E analyser. Prior to the experiments, the zeolite sample was 
evacuated for 6 h at 473 K. While kinetics of CO2 adsorption was determined at 
298 K and 1.2 bar (Fig. 4c), cyclic CO2 adsorption was repeated 100 times at 343 K 
and 1.2 bar in vacuum-swing-regeneration mode (Fig. 4a, b). 
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Extended Data Figure 1 | Characterization of ZSM-25 and PST-20 zeolites. a,b, PXRD patterns (left panels), *”Al (middle panels) and ”’Si (right panels) magic- 
angle spinning NMR spectra of as-made (bottom plots within each panel) and calcined (top plots within each panel) ZSM-25 (a) and PST-20 (b). 
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Extended Data Figure 2 | Reconstructed 3D reciprocal lattice fromthe RED 3D reciprocal lattice showing the (hk0) plane (b, e), (kh) (c) and (hkk) 
data. a-c, NaTEA-ZSM-25 and d-f, NaSrTEA-PST-20. a, d, The 3D reciprocal _(f) reciprocal plane. The distributions of the strong reflections for NaTEA- 
lattice with the crystal inset. b, c, e, f, 2D slices cut from the reconstructed ZSM-25 and NaSrTEA-PST-20 are similar to that of PAU. 
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Extended Data Figure 3 | Structure factor amplitudes and phases calculated The red, green and blue circles correspond to d-spacings of 1.0 A, 1.6 A and 
from the structure models of RHO-G1 to RHO-G6. The (hkh+k) reflections 3.0 A, respectively. The frameworks are idealized in the pure SiO, forms. 
are shown. Reflections in red and blue have phases of 0° and 180°, respectively. 
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Extended Data Figure 4| PXRD profiles for the Rietveld refinement of as- 
made and Na* -exchanged NaSrTEA-PST-20. Top, as-made NaSrTEA-PST- 
20. Bottom, Na” -exchanged NaSrTEA-PST-20 (denoted NaTEA-PST-20). 

The observed, calculated and difference curves are shown in blue, red and black, 


respectively. The good agreement of observed and calculated data at high angles 
(inset) indicates that the framework structure is correct. The slight differences 
at lower angles are due to incomplete determination of the positions of all 
guest molecules/cations (X-ray wavelength 1 = 0.40091 A). 
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Extended Data Figure 5 | The prediction of the RHO-family members structure factors of strong reflections from RHO-G(n — 1), which allowed a 
RHO-GI1 to RHO-G6 from the structure of PAU (RHO-G3). The arrows 3D structure model of RHO-Gn to be built. The structures of RHO-G1 and 
indicate how the structures were predicted from their nearest generations. The | RHO-G2 were obtained from RHO-G3 by model building. 

3D electron-density map of RHO-Gn (n = 4-6) was generated using the 
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Extended Data Figure 6 | Tile representations of the structures of RHO-G1 _ resulting in isoreticular expansion of the scaffold; and second, other cages are 
to RHO-G6 in the RHO family. The structure expansion operates at two embedded (middle) in the inter-scaffold space. The resulting frameworks 
levels: first, a pair of pau and d8r cages is inserted along each unit-cell edge (top) are denoted as ‘embedded isoreticular zeolite structures’ (bottom). 
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Extended Data Table 1 | Structure factor amplitudes and phases used for structure solution of ZSM-25 
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Structure factor amplitudes of the strongest reflections obtained from RED data, and the corresponding reflections and structure factor phases in the PAU structure. The amplitudes | Fosm-25 | were calculated 
as the square-roots of the intensities extracted from RED. The indices of the corresponding reflections in the PAU structure were obtained: hpay = hzsu-25 X apau/azsm-25, Kpau = Kzsm-25 X @pau/@zsm-25, 
Ipau = lzsm-25 X Apau/azsm-25, Where the unit cell apay = 35A and azsy.25 = 45A. 
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Extended Data Table 2 | Room-temperature CO3/CH, and CO2/N> selectivities 


CO,/CH, selectivity COz/No selectivity 
Materials 0.1 bar 1.0 bar 0.1 bar 1.0 bar 
NaTEA-PST-20 117 15 47 10 
NaTEA-ZSM-25 331 22 43 10 
NaTEA-ECR-18 105 11 49 9 
Na-Rho 142 23 319 31 
K-chabazite 13 3 42 7 


The selectivities are measured at 0.1 bar and 1.0 bar for NaTEA-PST-20, NaTEA-ZSM-25, NaTEA-ECR-18, Na-Rho and K-chabazite. The CO2/CH4 and CO2/Nz selectivities are defined as Qco, /QcH, ANd Qco, /Qn,+ 
respectively, where Qco,, Qc,, and Quy, are the respective equilibrium molar uptakes of COz, CH4 and Nz ata given pressure taken from the corresponding single component isotherms. 
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Conversion of amides to esters by the 
nickel-catalysed activation of amide C-N bonds 


Liana Hie', Noah F. Fine Nathel', Tejas K. Shah!, Emma L. Baker’, Xin Hong’, Yun-Fang Yang’, Peng Liu', 


K. N. Houk' & Neil K. Garg! 


Amides are common functional groups that have been studied for 
more than a century’. They are the key building blocks of proteins 
and are present in a broad range of other natural and synthetic 
compounds. Amides are known to be poor electrophiles, which is 
typically attributed to the resonance stability of the amide bond’”. 
Although amides can readily be cleaved by enzymes such as prote- 
ases’, it is difficult to selectively break the carbon-nitrogen bond of 
an amide using synthetic chemistry. Here we demonstrate that 
amide carbon-nitrogen bonds can be activated and cleaved using 
nickel catalysts. We use this methodology to convert amides to 
esters, which is a challenging and underdeveloped transformation. 
The reaction methodology proceeds under exceptionally mild reac- 
tion conditions, and avoids the use of a large excess of an alcohol 
nucleophile. Density functional theory calculations provide insight 
into the thermodynamics and catalytic cycle of the amide-to-ester 
transformation. Our results provide a way to harness amide func- 
tional groups as synthetic building blocks and are expected to lead 
to the further use of amides in the construction of carbon-hetero- 
atom or carbon-carbon bonds using non-precious-metal catalysis. 

The ability to interconvert functional groups is important in syn- 
thetic chemistry and many biological processes. Methodologies** have 
been developed that enable chemists to strategically harness the react- 
ivity of most functional groups. Likewise, breakthroughs in biochem- 
istry have led to an understanding of how changes in functional groups 
regulate physiological processes®. 

One particularly interesting dichotomy exists in considering the 
amide functional group’, which is the key component of all proteins 
(Fig. 1a). Since Schwann’s initial discovery of pepsin—the first enzyme 
to be discovered—in 1836, scientists have been intrigued by the ability 
of enzymes to break amide linkages**. Such amide cleavage processes 
govern many cellular regulatory functions and are responsible for the 
degradation of proteins to amino acids’’. In contrast, the synthetic 
chemistry of amide-bond cleavage has remained underdeveloped, 
even though amides are well suited for use in multistep synthesis 
because of their stability under a variety of reaction conditions. 
Commonly used methods to break amide carbon-nitrogen (C-N) 
bonds include the reductive conversion of amides to aldehydes using 
Schwartz’s reagent’ and the displacement of Weinreb’s N-OMe-N-Me 
amides with organometallic reagents en route to ketones®. Following 
Pauling’s seminal postulate regarding amide planarity’, the poor react- 
ivity of amides is now well understood as being a result of the strength 
of the resonance-stabilized amide C-N bond’. 

To circumvent the long-standing problem involving the low react- 
ivity of amides and their modest synthetic use in C-N bond cleavage 
processes, we designed the general approach shown in Fig. 1b. The 
C-N bond of amide 1 undergoes activation by a transition-metal 
catalyst. Following oxidative addition, the resultant acyl metal species 
2 is trapped by an appropriate nucleophile to furnish product 3, with 
the release of amine 4. This approach allows for the breakdown of 
amides, and renders amides useful synthetic building blocks. 
Although examples exist for the metal-catalysed C-heteroatom bond 


activation of acid chlorides’, anhydrides’, and 2-pyridyl esters’®, to 
our knowledge, the direct metal-catalysed activation of C-N bonds 
of amides is unknown. This is notable given the widespread use of 


O on 
- R’ oe = - R’ 
N N 
| 
R” 


Resonance stabilization of amides 


Methods for amide bond cleavage 


In nature 


Enzymatic hydrolysis 
(for example, proteases) 


| | 


Benefits to human health 
(for example, energy, natural functions) 


Using chemical synthesis 


Low reactivity of 
amides 


Minimal use of amides in 
C-N bond cleavage processes 


R’ ucleophile / 
Neon ~ + HN 
| Transition-metal Nuc ‘R” 
R” catalysis 
fs ” 3 4 
“ke i 
' Oo F ¢ Deconstruction 
Catalytic R ehamides 
amide C-N M—N 
bond activation Ln ‘R” ¢ Amides as synthetic 
2 building blocks 
c Amide =» Ester 
oO O R’ 
-R’ 4 yHo—R” ~ Nicatalysis + HN 
N SEEN oR” \ 
| R’ 
R” 
1 5 6 4 


e Readily available and stable substrates 

e Amide activation via Ni catalysis 
¢ Non-precious-metal catalysis 

e Amides as synthetic tools 
¢ Mild reaction conditions 


Figure 1 | Amide-bond cleavage using transition-metal catalysis. a, An 
illustration of the stability of amides and the contrast between how amides are 
used in nature and in chemical synthesis. b, Design of amide C-N bond 
activation to deconstruct amides and exploit them as synthetic building blocks 
(nuc, nucleophile; L,, ligands coordinated to transition metal; blue spheres, 
R’, R", R”, any carbon-based functional groups). c, Strategy for the conversion 
of amides to esters. 
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Figure 3 | Scope of our methodology. a, b, The scope of the amide-to-ester 
transformation was evaluated with respect to the amide substrate (a), and 
with respect to the alcohol nucleophile, using 7g as the amide substrate (b). 
Reactions were carried out with Ni(cod), (10 mol%), SIPr (10 mol%), substrate 


(100.0 mg, 1.00 equiv.), alcohol (1.2 equiv.), and toluene (1.0 M) at 80 °C for 
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12h. Yields shown reflect the average of two isolated experiments, except for 
entry 10; the yield for entry 10 was determined by 'H NMR analysis using 
hexamethylbenzene as an internal standard, owing to the volatility of the ester 
product. ¢-Bu, tert-butyl; p, para; m, meta; o, ortho. 


transition-metal catalysis in organic synthesis, where there exist many 
examples of catalytic transformations occurring smoothly in the pres- 
ence of amide linkages. 

We validate the strategy outlined in Fig. 1b through the conversion 
of amides to esters (Fig. 1c). Amide to ester conversion, much like 
transamidation'’”’, remains a challenging and underdeveloped syn- 
thetic transformation. Amides are often stable enough that esterifica- 
tion is difficult and requires the use of harsh acidic or basic conditions, 
while employing a large excess of nucleophile (for example, using the 
alcohol nucleophile as a solvent)’. Perhaps the most promising pro- 
tocol to achieve amide-to-ester conversions is Keck’s methylation/ 
hydrolysis sequence’’, although this methodology is limited to the 
synthesis of methyl esters. Esterifications using acyl aziridines'* and 
N-methylamides (albeit with activation by nitrosation)’* have also 
been reported. Here we demonstrate the nickel-catalysed conversion 
of amides to esters, which proceeds under exceptionally mild reaction 
conditions. In addition to establishing the scope of this methodology, 
we use density functional theory (DFT) calculations to predict whether 
the amide-to-ester conversion, or the reverse, is thermodynamically 
favoured. DFT calculations are also used to predict a plausible catalytic 
cycle. These experimental and computational studies not only sub- 
stantiate the notion of using non-precious-metal catalysis for the 
activation of amide C-N bonds, but also lay the foundation for further 
studies aimed at the strategic manipulation of amides as synthetic 
building blocks using catalysis. 

We examined the conversion of benzamides 7 to methyl benzoate 8a 
both computationally (using the “Gaussian 09’ software; see 
Supplementary Information) and experimentally (Fig. 2). Because 
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amides are known for their stability, we assessed whether the amide- 
to-ester conversion could be rendered thermodynamically favourable 
by the judicious choice of amide N-substituents. Using DFT methods, 
we calculated the change in Gibbs free energy AG for the reaction of 
amides 7 with methanol to give esters 8a and amines 4. Whether this 
transformation is favourable or not depends on the nature of the 
N-substituents (entries 1-8). Methanolysis of Weinreb amide 7d 
(entry 4) and N-arylated substrates 7f and 7g (entries 6-8) were found 
to be the most energetically favourable. In contrast, esterifications of 
N-alkyl amides 7a, 7b, and 7e were deemed thermodynamically un- 
favourable. This is in line with the experimentally measured equilib- 
rium constant for the reaction of N,N-dimethylbenzamide 7b and 
methanol (entry 2), in which the reverse reaction is thermodynamically 
favoured (see Supplementary Information for further discussion)'*. 
Encouraged by the unique ability of nickel to catalyse the activation 
of strong aryl-heteroatom bonds’, particularly those in phenol”, 
aniline*®’*’, and phthalimide” derivatives, we also calculated the 
activation free energies for acyl C-N bond oxidative addition of each 
amide substrate using nickel catalysis. The barriers calculated for com- 
mercially available N-heterocyclic carbene ligand SIPr (entries 1-8) 
reveal that the oxidative addition barriers are reasonable in some cases. 
We studied these reactions experimentally using 10 mol% Ni(cod),, 
10 mol% SIPr, 2.0 equivalents of methanol, and toluene as solvent at 
110°C for 12h. There was good agreement between our observations 
and computational predictions. No reaction or low yields were seen for 
substrates 7a—7e (entries 1-5). However, when the calculated AG and 
the oxidative addition barrier were favourable, substantial formation 
of product 8a was observed (entries 6 and 7). Coupling of substrate 7g 
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Figure 4 | Computational study of catalytic cycle. DFT methods were used to 
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gave a quantitative yield of product (entry 7), and further optimization 
showed that even with only 1.2 equivalents of methanol and a tem- 
perature of 80 °C, product formation occurred smoothly (entry 8) to 
give complete conversion to 8a. Importantly, no reaction takes place if 
either the precatalyst or the ligand are omitted, whereas the use of 
alternative N-heterocyclic carbene or phosphine ligands typically leads 
to lower yields or no reaction. We conclude that nickel catalysis is 
indeed operative in the amide activation/esterification process. 

Having determined the optimal reaction conditions, we examined 
the scope of the transformation with regard to the amide substrate 
(Fig. 3a). In addition to the parent benzamide (entry 1), substrates 
containing the electron-withdrawing trifluoromethyl or fluoride sub- 
stituents (entries 2 and 3) or the electron-donating methoxy or methyl 
substituents (entries 4 and 5) were well tolerated. The transformation 
also proceeded smoothly using meta- and ortho-methyl-substituted 
substrates to give the desired esters in excellent yields (entries 6 
and 7). Beyond the use of phenyl derivatives, we examined naphthyl 
and heterocyclic substrates. Naphthyl compounds readily coupled 
(entries 8 and 9), as did furan, quinoline, and isoquinoline substrates 
(entries 10-12, respectively). However, amides derived from alkyl 
carboxylic acids did not undergo the nickel-catalysed esterification 
under our reaction conditions. This attribute provides opportunities 
to realize selective amide C-N bond cleavages in more complex sub- 
strates (see below). 

A variety of N-substituents were also surveyed, as shown in Fig. 3a. 
In addition to the longer N-butyl (Bu) and the branched N-iso-propyl 
alkyl chains (entries 13 and 14, respectively), we found that a cyclic 
amide derived from indoline was tolerated by the methodology (entry 
15). Lastly, protected N-alkyl benzamides were tested. Although use of 
the N-p-toluenesulfonyl (Ts) derivative gave the corresponding ester 
in modest yield (entry 16), the corresponding N-tert-butyloxycarbonyl 
(Boc) substrate more efficiently underwent conversion to ester 8a 
(entry 17). The analogous N-benzyl, N-tert-butyloxycarbonyl 
(N-Bn,Boc) substrate was also evaluated and gave the desired ester 
in 89% yield (entry 18). These results show that the methodology is 
not restricted to anilide substrates, as long as the overall reaction 
energetics are thermodynamically favourable (see Supplementary 
Information for energetics involving the N-Boc,Me substrate). 
Moreover, secondary benzamides can be used strategically as sub- 


strates for esterification, following a straightforward activation step 
(Boc-protection). 

Using amide 7g as the substrate, we evaluated the scope of the 
methodology with respect to the alcohol nucleophile (Fig. 3b). As 
shown, synthetically useful yields of product were obtained using only 
1.2 equivalents of the alcohol, even when complex and hindered alco- 
hols were used. Cyclohexanol, t-butanol, and 1-adamantol coupled 
smoothly to give the corresponding esters (entries 19-21, respectively); 
tert-butyl esters can readily be hydrolysed to carboxylic acids under 
acidic conditions. Similarly, we found that cyclopropyl carbinol and an 
oxetane-derived alcohol could be used in the esterification reaction 
(entries 22 and 23, respectively). The use of the hindered secondary 
alcohol (—)-menthol was also tested and the desired ester was obtained 
in 88% yield (entry 24). Furthermore, we found that Boc-L-prolinol 
was tolerated in the methodology (entry 25), in addition to an indole- 
containing alcohol (entry 26), which further demonstrates the promise 
our methodology holds for reactions of heterocyclic substrates. As 
shown in entries 27 and 28, a complex sugar-containing alcohol bear- 
ing two acetals and an estrone-derived steroidal alcohol, respectively, 
also underwent the desired esterification reaction. 

Although nickel-catalysed aryl and acyl C-O bond activation pro- 
cesses have been previously studied computationally*”, no analogous 
studies involving C-N bond activation have been reported. Thus, to shed 
light on the mechanism of the facile amide-to-ester conversion, the 
catalytic cycle was computed using DFT calculations. Figure 4 provides 
the free energy profile using amide substrate 7g. The [Ni(SIPr)2] com- 
plex, 9, is believed to be the resting state of the catalytic cycle. Dissociation 
of one carbene ligand from complex 9 provides a coordination site for 
amide 7g. Following coordination to give intermediate 10, oxidative 
addition occurs via transition state 11. This key event cleaves the amide 
C-N bond and produces acyl nickel species 12. The next step of the 
catalytic cycle is ligand exchange, which proceeds by coordination of 
methanol to give intermediate 13. Subsequent ligand exchange via trans- 
ition state 14 facilitates the deprotonation of methanol, giving nickel 
complex 15. Dissociation of N-Me-aniline produces acyl nickel species 
16, which in turn, undergoes reductive elimination via transition state 17 
to deliver the ester-coordinated complex 18. Finally, the ester product 8a 
is released to regenerate catalyst 9. The rate-determining step in the 
catalytic cycle is the oxidative addition (transition state 11) with an 
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overall barrier of 26.0 kcal mol ' relative to the resting state 9. The overall 
reaction is thermodynamically favoured by —6.8 kcal mol’. Because 
decarbonylation of acyl nickel species have been observed”, we also 
calculated the kinetic barrier for decarbonylation events (see 
Supplementary Information). Consistent with experiments, decarbony- 
lation pathways from acyl nickel species 12 or 16 were found to be less 
favourable than the product formation pathways. 

As highlighted by the experiments shown in Fig. 5, the nickel- 
catalysed conversion of amides to esters can be used to achieve 
selective and mild amide-bond cleavages. First, we performed the 
esterification of bis(amide) substrate 19 using (-)-menthol (Fig. 5a). 
Although both amides are N-arylated benzamides, only the tertiary 
amide was cleaved to give ester 21, while also releasing aminoamide 22. 
Second, bis(amide) 23, which possesses two tertiary amides, was 
studied in the nickel-catalysed esterification reaction (Fig. 5b). In this 
case, the tertiary L-proline-derived alkyl amide was not disturbed, 
while the tertiary benzamide underwent cleavage to give ester 21 
and aminoamide 24 in good yields. Lastly, we prepared L-valine deriv- 
ative 25, which also bears an ester (Fig. 5c). Upon exposure of 25 to 1.2 
equivalents of (-)-menthol and the nickel-catalysed conditions, ester 
21 and aminoester 26 were obtained in 70% and 79% yields, respect- 
ively. We believe that the ester functionality withstands the reaction 
conditions because it is not attached to an arene, analogous to the lack 
of reactivity seen in our attempts to esterify amides derived from alkyl 
carboxylic acids (for example, 23). Compounds 24 and 26 were 
obtained in high enantiomeric excess, highlighting the mild nature 
of the reaction conditions, which avoid any substantial epimerization 
of the « stereocentres. 

We have presented an efficient way to convert amides to esters. The 
methodology circumvents the classic problem of amides being poorly 
reactive functional groups by using nickel catalysis to achieve the 
previously unknown catalytic activation of amide C-N bonds. DFT 
calculations support a catalytic cycle that involves a rate-determining 
oxidative addition step, followed by ligand exchange and reductive 
elimination. The methodology is broad in scope, particularly with 
respect to the alcohol nucleophiles, and proceeds under exceptionally 
mild reaction conditions using just 1.2 equivalents of the alcohol 
nucleophile. Moreover, selective amide-bond cleavage is achieved in 
the presence of other functional groups, including less reactive amides 
and esters, without the epimerization of « stereocentres. We envision 
that this methodology will lead to advances such as the catalytic 
esterification of primary amides, additional N,N-disubstituted amides, 
amides derived from alkyl or vinyl carboxylic acids, and perhaps 
even polyamide substrates bearing multiple stereocentres. This 
study should enable the further use of amides as valuable building 
blocks for the construction of C-heteroatom or C-C bonds using 
non-precious-metal catalysis. 
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Soils of the northern high latitudes store carbon over millennial 
timescales (thousands of years) and contain approximately double 
the carbon stock of the atmosphere’*. Warming and associated 
permafrost thaw can expose soil organic carbon and result in 
mineralization and carbon dioxide (CO,) release**. However, 
some of this soil organic carbon may be eroded and transferred 
to rivers’ ’. If it escapes degradation during river transport and is 
buried in marine sediments, then it can contribute to a longer-term 
(more than ten thousand years), geological CO, sink*”°. Despite 
this recognition, the erosional flux and fate of particulate organic 
carbon (POC) in large rivers at high latitudes remains poorly con- 
strained. Here, we quantify the source of POC in the Mackenzie 
River, the main sediment supplier to the Arctic Ocean’’”, and 
assess its flux and fate. We combine measurements of radiocarbon, 
stable carbon isotopes and element ratios to correct for rock- 
derived POC’®’*"*. Our samples reveal that the eroded biospheric 
POC has resided in the basin for millennia, with a mean radio- 
carbon age of 5,800 + 800 years, much older than the POC in large 
tropical rivers’*’*. From the measured biospheric POC content 
and variability in annual sediment yield’’, we calculate a biospheric 
POC flux of 2.2*}3 teragrams of carbon per year from the 
Mackenzie River, which is three times the CO, drawdown by 
silicate weathering in this basin’®. Offshore, we find evidence for 
efficient terrestrial organic carbon burial over the Holocene 
period, suggesting that erosion of organic carbon-rich, high-latitude 
soils may result in an important geological CO, sink. 
Photosynthesis and the production of organic carbon by the terrest- 
rial biosphere (OCpiosphere) is a major pathway of atmospheric CO, 
drawdown. Over millennial timescales, some OCpiosphere eScapes 
oxidation and contributes to a transient CO, sink in soil**’’. 
Longer-term CO, drawdown can be achieved if OCyiosphere is eroded, 
transferred by rivers and buried in sedimentary basins”'*'*””. Burial of 
OCpiosphere Fepresents a major geological CO, sink (and source of 
oxygen, O,) alongside the chemical weathering of silicate minerals 
by carbonic acid, coupled to carbonate precipitation’®”’. These fluxes 
negate CO emissions from the solid Earth”® and from oxidation of 
rock-derived OC”, contributing to the long-term regulation of global 
climate’*”®. Physical erosion is thought to play an important part in 
this OCpiosphere transfer because it controls the rate of biospheric par- 
ticulate organic carbon (POCpiosphere) export by rivers’ and influ- 
ences sediment accumulation and the efficiency of OC burial’®’*7*, 
In the northern high latitudes, large amounts of OCpiosphere are 
stored in soil’. The upper three metres of soil in the region of northern 
circumpolar permafrost are estimated to contain 1,035 + 150 peta- 
grams of carbon (PgC), approximately double the CO, content of 
the pre-industrial atmosphere”’’. Many of these soils accumulated 
during the retreat of large continental ice sheets following the Last 
Glacial Maximum, with a peak expansion between 12,000 and 8,000 


calibrated years before present (BP)”*, where ‘present’ is 1950, and the 
OCyiosphere Can be thousands of years old*. This vast carbon reservoir is 
located in a region sensitive to environmental change over glacial 
interglacial timescales” and to warming over the coming century’. 
Much focus has been placed on its potential to become a CO, 
source’ **, However, geological CO, drawdown by POCyiosphere €F0- 
sion at high latitudes has remained poorly constrained”. 

Here we sample POC carried by the major rivers in the Mackenzie 
basin and investigate its fate using an offshore sediment core extending 
over the Holocene (Extended Data Fig. 1). The Mackenzie River is the 
largest source of sediment to the Arctic Ocean'’’*"* and erosion of 
mountainous topography in the basin results in a high sediment dis- 
charge, similar to the combined total of 16 Eurasian rivers draining to 
the Arctic Ocean’*’. We collected river depth profiles to characterize 
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Figure 1 | Source of POC in the Mackenzie River basin. Radiocarbon activity 
of POC (Finoa) Versus the nitrogen to organic carbon ratio (N/OCyota) of 
suspended sediments from the Mackenzie River (circles) at the delta (black), at 
Tsiigehtchic (grey) and at Norman Wells (white) and from its major tributaries 
the Liard (diamond), the Peel (dark blue square) and the Arctic Red (light 
blue square). River depth profiles collected in 2010 and 2011 suspended load 
(filled symbols), river bed materials (open symbols) and sieved bank samples 
(collected in 2009, sizes shown on figure) are shown with analytical errors 

(2 standard deviations, s.d.) as grey lines if larger than the data points. The 
dashed line shows the compositions expected by mixing rock-derived POC ctr 
(black rectangle) and POCyiosphere (green shading). The solid green line is the 
trend from a peat core in western Canada”’. 
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POC across the range of grain sizes carried by large rivers'*'*”° at the 
main conduit for sediment export to the Arctic Ocean in the 
Mackenzie delta, at key points on the Mackenzie River and from its 
major tributaries (Extended Data Fig. 1). To investigate temporal vari- 
ability of POC composition, river depth profiles were collected shortly 
after ice break-up at the high/rising stage (June 2011) and during the 
falling stage (September 2010), while river surface and bank samples 
were collected in June 2009. 

To correct for rock-derived, ‘petrogenic’ POC (POC,etro), likely 
to be important in the Mackenzie basin’”*, we combine measure- 
ments of radiocarbon ('*C, reported as the ‘fraction modern’, Fynoa), 
total OC content ([OCyotai]), stable isotopes of OC (0 "Cog the nitro- 
gen to OC ratio (N/OC,a) and the aluminium to OC ratio 
(Al/OCyotai), all of which allow us to assess the age and concentration 
of POCpiosphere (see Methods)”'®'*'*??, Published ~C ages of surface 
samples from the Mackenzie River*’ (n = 5) vary between 6,010 yr and 
10,000 yr but the MC depletion caused by POC, ero versus aged 
POCpiosphere has not been assessed. We also examine the hydrodyn- 
amic behaviour of POC, using the aluminium-to-silicon ratio (Al/Si) 
ratio as a proxy of sediment grain size and mineral composition”®. 

We find that river POC is depleted in '*C throughout the Mackenzie 
basin (Extended Data Table 1). Fynoq values range between 0.28 
('*C age 10,106 + 42 yr) and 0.63 ("*C age 3,675 + 36yr) in the sus- 
pended load (n = 27) and between 0.12 (4c age 17,002 + 84yr) and 
0.16 (4C age 14,601 + 64yr) in the river bed materials (n = 4). To 
investigate the cause of this MC depletion, we examine the N/OC,otai 
ratio. Degradation of organic matter in soils can increase the relative N 
abundance”, differentiating degraded POCyiosphere (high N/OCiotal) 
from young, fresh POCpiosphere (low N/OCyotai). Suspended load sam- 
ples display a negative relationship between N/OCjota1 and Fimoa 
(Fig. 1), similar to measurements from a peat core in the Mackenzie 
basin” away from permafrost. There, N/OCjotal ratios increased with 
'C age (1,250-10,200yr) and soil depth (0-3 m). In contrast, river 
bed materials have lower Fyoa values and a relatively restricted range 
of N/OC,.ta1 values and are distinct from suspended load (Fig. 1). 
A dominance of POC,.1,. in bed materials'®'* with a N/OC, ota) ratio 
of about 0.07 can explain their composition. 

Together, the Fyrog and N/OCiota1 values suggest that POC in the 
Mackenzie River is a mixture of POC, 4,5 and POCpiosphere: itself vary- 
ing in EC age from ‘modern’ to about 8,000 yr old (Fig. 1). The 8 Cog 
values and Al/OC,.:a1 ratios support this inference (Extended Data 
Fig. 2). Using an endmember mixing analysis'®'* we quantify 
POC, tro Content of sediments (Methods) and find that suspended 
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load at the Mackenzie River delta is dominated by POCpiosphere 
(~70%-90% of the total POC). Having corrected for POCpctro, we 
investigate the source of POCpjiosphere by estimating its average ac C) 
age. This varies from 3,030+ 150yr to 7,900+ 400yr (Extended 
Data Fig. 3) with an average =C age of POCyiosphere = 5,800 + 800 yr 
(+2 standard errors, s.e.) in suspended sediments of the Mackenzie 
River delta. These values are older than estimates of POCpiosphere 
age from the Amazon River (1,120-2,750yr)’* and Ganges River 
(1,600-2,960 yr)'*. The ages reflect mixing of young, fresh 
POCyiosphere (present in each of these large river basins) with an older 
POCypiosphere in the Mackenzie basin (Fig. 1), likely to consist of peat 
soils that expanded between 9,000yr and 8,000yr ("*C age)”. 
POC iosphere can be eroded by slumping and landsliding on river banks, 
across deep soil profiles*’. Sections of the landscape that have discon- 
tinuous permafrost and those undergoing permafrost degradation” 
may be important sources of aged POCpiosphere: in addition to river 
banks, which are undercut during peak water discharge following ice 
break-up’*. Our samples suggest that erosion and fluvial transfer of 
millennial-aged POCpiosphere is extensive in the Mackenzie basin. 
Once in the river, POCyiosphere is sorted with river depth, revealed by 
the Al/Si ratio (Fig. 2b) a proxy for grain size”. In bed materials with 
low AI/Si, POC, eto dominates (Fig. 1) and leads to low Fyoq values 
(Fig. 2c). Just above the river bed, during the two sampling campaigns, 
coarse suspended sediments (low Al/Si) hosted the youngest, least 
degraded POCyiosphere (low N/C), leading to a large contrast in MC 
age from the bed materials. Towards the river surface, older, more 
degraded POCyiosphere appears to become more dominant, and is 
transported with fine sediment and clays (high Al/Si)**. The large 
contribution of degraded, very old POCpiosphere (5,000 yr) in the 
Mackenzie River contrasts with large tropical rivers where organic 
matter turnover in terrestrial ecosystems is more rapid (Fig. 2c)'*"*. 
To assess how erosion in the Mackenzie River may lead to long-term 
CO, drawdown, we estimate POC, iosphere discharge. River depth pro- 
files collected at the high and falling stages suggest that the [OC,,¢ai] of 
the suspended sediment load did not vary systematically with sedi- 
ment grain size (Extended Data Fig. 4). Future work should seek to 
assess temporal variability in POC content and composition. Our 
data suggest that changes in grain size with water discharge (Fig. 2b) 
could be important in setting the variability of POC) ;osphere age carried 
by the river (Fig. 2c). The [OCjota1] values at the Mackenzie delta were 
1.6 + 0.5% (n = 8, + 10), which were similar to the mean measured in 
the Mackenzie delta in June-July 1987 of 1440.2 (n= 10)”. 
Although our sample set is modest in size, it helps us to better constrain 
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Figure 2 | Transport of POC in the Mackenzie River. a, River depth profile 
collection from the Mackenzie River delta during the falling stage, with 
Acoustic Doppler Current Profiler data used to determine channel geometry, 
water velocity and water discharge. b, Aluminium to silicon ratio (Al/Si), a 
proxy for sediment grain size*’, with water depth normalized to maximum 


depth. Coarser materials are carried throughout the profile during the high 
stage. c, Radiocarbon activity of POC (Fmoa) versus Al/Si for the Mackenzie 
basin (this study, symbols as in Fig. 1), Amazon River", and Ganges River'®”’. 
River suspended load (filled symbols) and river bed materials (open symbols) 
are distinguished. Analytical errors (2 s.d.) are smaller than the data points. 
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the range of POC contents in the suspended load of the Mackenzie 
River. In addition, our endmember mixing analysis allows us to 
provide the first estimates of [OCpiosphere], which varies between 
0.7 + 0.1% and 2.4 + 0.2%. 

To estimate POCpjiosphere discharge, we use the most complete data 
set of annual sediment discharge to the Mackenzie delta (1974-1994), 
which ranged from 81 teragrams per year (Tgyr ') to 224Tgyr '. A 
Monte Carlo approach is used to account for the modest sample size by 
using the full measured variability in both [OCyiosphere] and annual 
sediment discharge (Methods). We estimate POC, iosphere discharge to 
be 2.2*}'3 teragrams of carbon per year (Tg Cyr’ '), which is sustain- 
able over 1,000 to 10,000 years, depleting the soil carbon stock by 
~0.006% per year (Methods). We estimate the POC,¢1;. discharge to 
be 0.4*9 TgCyr |. These estimates do not account for ice-covered 
conditions, when <10% of the annual sediment discharge is con- 
veyed’, Nevertheless, our estimate of POCyiospnere discharge is greater 
than the combined POC discharge of around 1.9TgCyr ‘ by the 
major Eurasian Arctic rivers (Ob, Yenisei, Lena, Indigirka and 
Koyma)""”* which cover approximately 8.6 million square kilometres. 
According to the available measurements, the Mackenzie River dom- 
inates the input of POCyiosphere to the Arctic Ocean. 

The mobilization of millennial-aged POC);osphere from soils at high 
latitudes has been viewed as a short-term source to the atmosphere if 
decomposition releases greenhouse gases (CH, and CQ,)***. 
However, if POCyiosphere escapes oxidation during river transport 
and is buried offshore, erosion acts as a long-term CO, sink'®"*°??, 
Offshore, aged POCyiosphere from the Mackenzie River (Fig. 1) can 
explain the '*C depletion and 5'°C of bulk organic matter, and old 
‘4C ages of terrestrial plant wax compounds (up to 20,000 yr) in sur- 
face sediments of the Beaufort Sea””’”’. We provide new evidence that 
terrestrial POC is buried efficiently offshore and accumulates in sedi- 
ments over 10,000 years. Benthic foraminifera Mc ages in a borehole 
located at the head of the Mackenzie trough (MTW01) indicate that 
21 m of sediment have accumulated since 9,183 * {22 calibrated years BP, 
suggesting a high sedimentation rate during the Holocene of 
2.7 + 0.1m per thousand years (Extended Data Table 2, Methods). 
These marine sediments have [OC; ota] values similar to those mea- 
sured in the Mackenzie River in both the <63 tum (1.5% to 1.7%) and 
>63 pm (1.1% to 1.4%) size fractions (Fig. 3). Their N/OC,,¢a1 and 
OCsg values suggest that they are dominated by terrestrial POC with 
minor marine OC addition (Extended Data Fig. 5). We use the change 
in OCjota)/Al ratios offshore to estimate OC burial efficiencies to have 
been 65 + 27% or more over the Holocene at this site (Methods). 
Rapid sediment accumulation and low temperature are likely to pro- 
mote high POC burial efficiency'**”. Also, the fluvial transport 
dynamics of POCyiosphere May promote burial (Fig. 2c). The oldest, 
most-degraded POChiosphere is transported with clays”*, whose asso- 
ciation with organic matter may enhance burial efficiency”’, while the 
youngest, least-degraded POCpiosphere is carried near the river bed at 
the highest sediment concentrations. Our findings suggest that ero- 
sion and riverine transfer at high latitudes can lead to the long-term 
preservation of terrestrial POC in marine sediments (Fig. 3). 

Erosion of high latitude soils and riverine export of POCpiosphere May 
represent an important geological CO, sink. Our estimate of the mod- 
ern day POCyiosphere discharge of 220s TgCyr | in the Mackenzie 
River may be refined by additional temporal sampling. However, it is 
three times the modern rates of CO, drawdown by weathering of 
silicate minerals by carbonic acid in the Mackenzie River’, at around 
0.7TgCyr *. Preservation of POC offshore (Fig. 3) suggests that the 
erosion of high-latitude soils, riverine POC) \osphere transport and export 
to the ocean acts as the largest geological CO, sink operating in the 
Mackenzie basin. We note that these longer-term fluxes are lower 
than estimates of greenhouse gas emissions from high-latitude soils 
in permafrost zones, owing to projected warming over the coming 
century’***!, While these fluxes remain uncertain, recent work” has 
proposed emissions of around 1-2 PgC yr’ ' which equate to a yield of 
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Figure 3 | Fate of particulate organic carbon offshore. a, Percentage organic 
carbon concentration of suspended sediments in the Mackenzie River delta 
(n = 8) where solid line and grey box show the mean = s.e., whiskers show 
+s.d. and the circles indicate the minimum and maximum values. 

b, Percentage [OC], ota) in sediments <63 um and >63 pm from core MTWO1 
in the Mackenzie trough (Extended Data Fig. 1) for depths dated by the “*C 
activity of mixed benthic foraminifera (Methods), where whiskers show the 
analytical error if larger than the data point size. 


~70 tonnes of carbon per square kilometre per year (tC km * yr“ ') 
over 17.8 X 10° square kilometres of soils in permafrost zones. This 
estimate of accelerated release of CO, due to anthropogenic warming”’ 
is more rapid than the natural geological drawdown fluxes, of which we 
estimate a value for POCpiosphere Of 2-5 t C km~” yr for the Mackenzie 
basin (Methods). Over longer time periods, we postulate that this geo- 
logical CO, sink may be sensitive to climate conditions in the Arctic. The 
carbon transfer can operate when high latitudes host substantial 
POCyiosphere Stocks in soil, and when rivers can erode and transfer sedi- 
ments to the Arctic Ocean. Over the last million years, the POCpiosphere 
transfer is likely to have been enhanced during interglacials** (Fig. 3), 
whereas during glacial conditions, lower soil POCpiosphere Stocks and 
extensive ice-sheet coverage suggest that POCyiosphere Erosion may have 
been suppressed. We propose that erosion of terrestrial POCyiosphere 
by large rivers draining the Arctic could be important in long-term 
CO, drawdown’, coupling the carbon cycle to climatic conditions 
at high latitudes. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

River sample collection and preparation. River depth-profiles from September 
2010 and June 2011 (Extended Data Table 1) were used to collect the full range of 
erosion products and POC in large river systems, taking advantage of the hydro- 
dynamic sorting of particles’*’*"*”*. At each sampling site (Fig. 1), channel depth, 
water velocity and instantaneous water discharge were measured by two or more 
transects with an Acoustic Doppler Current Profiler (ADCP Rio Grande 600 kHz) 
before each depth profile was collected at a single point (+10 m) in the middle of 
the channel. On the boat, each sample (~7-8 litres) was evacuated into a clean 
bucket and stored in sterilized plastic bags and the procedure was repeated 
depending upon the total water depth. Each bag was weighed to determine the 
sampled volume, then the entire sample was filtered within 24 h using pre-cleaned 
Teflon filter units through 90mm diameter 0.2,1m PES (polyethersulfone) 
filters'*?°. Suspended sediment was immediately rinsed from the filter using fil- 
tered river water into clean amber-glass vials and kept cool. River bed materials 
were collected at the base of the depth transects from the boat, using a metal bucket 
as a dredge, and were decanted to a sterile bag. Riverbank deposits (June 2009) 
were collected from fresh deposits close to the channel (Extended Data Table 3) 
and sieved at 250 um, 150 um and 63 im to investigate the sorting of POC”. All 
sediments were freeze-dried upon return to laboratories within two weeks, 
weighed and homogenized in an agate grinder. 

Offshore borehole sample preparation. Marine sediment samples containing 
benthic foraminifera were obtained from the upper 22m Holocene sequence of 
an 85.1-m MTWO1 borehole® located at 69° 20’ 53’’ N, 137°59'13"’ in 45-m 
water depth in the Mackenzie trough (Extended Data Fig. 1). Drilled by the 
Geological Survey of Canada in 1984, the core is currently archived at the GSC- 
Atlantic core repository. To isolate foraminifera, sediment samples were disag- 
gregated over a sieve with <38 jum mesh using deionized water. On the basis of 
microfossil counts, four samples were selected with sufficient specimens for radio- 
carbon dating. 

Geochemical analyses. For the river-suspended sediments and core samples for 
organic carbon analyses, inorganic carbon was removed using a HC] fumigation 
technique to avoid the loss of a component of POC that is known to occur during a 
HCl leach*’. A method adapted to ensure full removal of detrital dolomite was 
used**. In summary, samples were placed in an evacuated desiccator containing 
about 50 ml 12 N HCl in an oven at between 60 °C and 65 °C for 60-72 h. Samples 
were then transferred to another vacuum desiccator charged with indicating silica 
gel, pumped down again and dried to remove HCl fumes. River sediment samples 
were analysed for organic carbon concentration [OC;otai] on acidified aliquots and 
percentage nitrogen concentration [N] on non-acidified aliquots by combustion at 
1,020 °C in O using a Costech elemental analyser in Durham. For river depth 
profile samples, acidified aliquots were prepared to graphite at the NERC 
Radiocarbon Facility of 1-2mg C for each sample and standard and '*C was 
measured by Accelerator Mass Spectrometry at the Scottish Universities 
Environmental Research Centre and reported as the fraction modern Fynoa by 
standard protocol’*. Process standards (96H humin) and background materials 
(bituminous coal) were taken through all stages of sample preparation and *C 
analysis and were within 2o uncertainty of expected values. Stable isotopes of POC 
(8° Cong) were measured by dual-inlet isotope ratio mass spectrometer (IRMS) on 
an aliquot of the same CO). These measurements were consistent with as Ome 
measurements made by an elemental analyser IRMS, normalized to measured 
standard values (n = 7) spanning >30%o and long-term analytical precision of 
0.2%o. Riverbank samples from 2009 were analysed by similar procedures at the 
National Ocean Sciences Accelerator Mass Spectrometry Facility (NOSAMS) at 
Woods Hole Oceanographic Institution. 

Mixed benthic foraminifera samples chosen from the MTWO1 core were ana- 
lysed at NOSAMS for '*C analyses. Samples were rinsed and no pre-treatments 
were used. The samples were directly hydrolysed with strong acid (H;PO,) to 
convert the carbon in the sample to CO;. Calibration of the *C dates was per- 
formed using CALIB (version 7.1)*”. All MC dates were normalized to a 8'°C of 
—25%bo versus VPDB (http://intcal.qub.ac.uk/calib/). Foraminifera dates were cali- 
brated using the MARINE13 data set**, with a reservoir age correction (AR) of 
335 + 85 yr (Extended Data Table 2). The AR value is based on a recent reanalysis 
of ages from 24 living molluscs collected before 1956 from the northwestern 
Canadian Arctic Archipelago”. This calibration set does not include specimens 
from the Beaufort Sea and as such provides only a best available estimate for AR in 
the Mackenzie trough. 

Endmember mixing model. The Fyioa, N/OCyotai (Fig. 1), 3° Corg values and Al/ 
OCyotat Values (Extended Data Fig. 2) are consistent with a mixing of POC, .t;5 and 
POCyiosphere dominating the bulk geochemical composition of river POC. 
Autochthonous sources are not an important component based on those mea- 
sured values, which is consistent with the turbid nature of the Mackenzie River 
(mean suspended sediment concentration of ~300-400 mg per litre), meaning 


that like other turbid river systems (for example, the Ganges-Brahmaputra) light 
penetration is minimal. A mixture of POC... and POCyiosphere can be described 
by the governing equations'****: 


Aviosphere + fpetro =1 (1) 


Joiosphere x Opiosphere Ws Fpetro Ipetro = Osample (2) 


where foiosphere ANd fretro are the fractions of POC derived from biospheric and 
petrogenic sources, respectively. O,ample is the measured composition (for example, 
Fmoa) of a river POC sample, and Opiosphere ANd Opetro are the compositions of 
biospheric and petrogenic sources. To quantify the f,etro in each sample we use 
the aluminium (Al) to OC,otai concentration ratio in river sediments. At each 
locality, a linear trend between Fmoq and Al/OCiota1 (Extended Data Fig. 2b) can 
be explained by a mixture of an Al-rich, OC-poor material (rock fragments con- 
taining POC, t;.) with Al-poor, OC-rich material (soils and vegetation debris as 
POCyiosphere)» Taking advantage of the fact that the POCpetro has Fmoa~ 0 
(unmeasurable above the background 4C content), the intercept at Finoa = 0 gives 
an estimate of the Al/OC,,:a) values and associated uncertainty of the sedimentary 
rock endmember. To estimate the average concentration of OC, ¢t;. of bedrocks in 
each basin, we use the Al concentration of river bed materials as a proxy for the Al 
concentration in the bedrocks”* and the Al/OC,,t1 value at Finog ~ 0. Following 
previous work in large rivers, we then assume that the OC,,.1. is well mixed in the 
water column and hasa relatively constant [OC,,.1;.] value'”'*"*. This method may 
overestimate fretro if OCpetro has been more extensively oxidized in fine-grained 
weathering products carried in the suspended load’. fpretro is quantified using 
[OC, ctro] and measured [OCiotai]. 

The mixing analysis returns a [OCpetro] = 0.12 + 0.03% (42a) in the Liard 
River and Mackenzie River at Tsiigehtchic, higher values in the Peel River 
[OC,ctro] = 0.63 + 0.30%, with the Mackenzie River at the delta having an inter- 
mediate value of [OC,ctro] = 0.29 + 0.05%. This is consistent with the known 
presence of POC,.1;o-bearing sedimentary rocks in the Mackenzie River basin 
and high OC,,ta1 contents of bedrocks in the upper Peel River basin and 
Mackenzie mountains”. To quantify the average '*C age of POC iosphere in each 
sample, equations (1) and (2) can be solved for Opiosphere» using the fretro Value and 
assumed Froq = 0 of POC) etro. The uncertainty mainly derives from that on fpetro 
and [OCyota] and has been propagated through the calculations. 

To test whether the mixing of POCpiosphere aNd POCpetro can describe the 

composition of the suspended load samples, we predict the 5'°C,.g measurements 
that were not used in the mixing analysis. The calculated fy... values and end- 
member values of —26.2 + 0.5%0 for POChpiosphere and — 28.6 + 0.5%o for POC petro 
were used, informed by measurements of bedrocks“’ and vegetation and soil in the 
basin*'. The mixing model (equation (2)) can robustly predict the 8° Corg differ- 
ences between the Peel and Liard rivers, and between suspended load and bed 
material 8'°C,,, values (Extended Data Fig. 2c), supporting a mixing control on 
the variables. 
Mackenzie River POC discharge. To quantify the discharge of POC we need to 
account for the variability in suspended sediment discharge and the variability in 
the POCyiosphere and POCyetr content of sediments in the basin. We use the 
longest, most complete quantification of sediment flux by the Mackenzie River 
from 1974-1994 (ref. 15), which has an annual average 127 + 40 Tg yr! (+1o). 
Annual sediment yield varied from 81 Tg yr ' to 224Tgyr *. Although the POC 
samples were not collected at the same time period, our measurements of [OC,ota1] 
at the delta (mean 1.6 + 0.5%; m = 8, +1¢) do not vary systematically between the 
falling and high stage (Extended Data Fig. 4) and are consistent with available data 
from samples” collected in 1987 (1.4 + 0.2, n= 10). 

While future work should aim to constrain the variability in POC composition 
further, these observations suggest that temporal variability may be less important 
than the potential variability in [OC;ora1] with depth at a given time, where we find 
[OCtorai] values can range from 1.0% to 2.7%. We use our measured range of 
[OCyiosphere] and [OCyetro] values and the full range of annual sediment yields’ 
to quantify POCpiosphere and POC petro discharge and the associated uncertainty 
using a Monte Carlo approach. Over 100,000 simulations, we use a ‘flat’ probabil- 
ity for the range of values for both variables (that is, equal probability of all 
measured values). This allows us to fully explore the range of estimates given 
the available measurements. Future work seeking to expand the number of 
[OCyiosphere] Measurements to assess its flux-weighted mean and variability, while 
assessing temporal variability in more detail, will allow POC discharge estimates 
and their uncertainty to be refined. POC iosphere (2.2 +03 TgCyr_') and POC tro 
(0.4*§:} Tg Cyr’) discharges are reported as the median (50%) + 1 s.d. Over the 
sediment source areas of the Mackenzie (downstream of the Great Slave Lake’*) of 
774,200 km”, these equate to yields of POCyiosphere = 2.9117 tCkm ” yr! and 
POCpetro = 0.693 tCkm 7 yr? 


org 
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The total POC discharge is slightly higher than a previous estimate 

(2.1 TgCyr *)" based on measurements of POC content made in 1987 because: 
(1) we account for higher POC, iosphere concentrations which may occur in water- 
logged POC,iosphere near the river bed (Fig. 2c, Extended Data Fig. 4); and (2) we 
account for the potential for very high annual sediment discharge’*. Based on 
estimates of soil carbon stock in the Mackenzie basin? of ~50 X 10°tCkm * 
and the upstream sediment source area (downstream of the Great Slave Lake, 
774,200 km”), the present rate of POCyiosphere €Xport represents a depletion of 
the soil carbon stock by ~0.006% per year, which is sustainable over 1,000 to 
10,000 years. 
OC burial efficiency in MTWO1. To estimate the burial efficiency of terrestrial 
POC at the MTW0O1 site, we normalize the measured [OC,.ta)] concentrations 
(Fig. 3) by Al concentration; Al is an immobile inorganic element hosted by major 
mineral phases. The OCiotai/Al normalization allows the effects of dilution to be 
distinguished from net OC gain (increased ratio) or OC loss (decreased ratio). 
The mean OCiota/Al of the MTWO1 samples was 0.17 + 0.02 gCperg Al 
(n=4,+2 s.e.). This is lower than the mean OCiotai/Al of the suspended 
load samples from the Mackenzie River delta of 0.26+0.10gCperg Al 
(n = 8, + 2s.e.). The decrease in the ratio offshore may suggest a higher relative 
proportion of POC,,.1:¢ (Extended Data Fig. 2b); however, this is not consistent 
with the less negative 8° Cong values (Extended Data Fig. 5). The decrease can 
therefore be interpreted in terms of OC loss, with the ratio of core to river samples 
being 0.17 + 0.02/0.26 + 0.10. 

Assuming that all the change in OC;o¢a)/Al is driven by OC loss, and taking into 
account the measurement variability in these values, we estimate that 65 + 27% of 
the OC has been preserved. However, we note that the OC,,)/Al ratios in the core 
are not statistically different from the river suspended load samples (one-way 
ANOVA, P> 0.1) which suggests that the OC burial efficiency could be higher 
(that is, 100%). In addition, if we use the OC;,,)/Al of finer river sediments carried 
near the channel surface—which may be more easily conveyed offshore—of 
020+ 0.04g¢° (n= 4, + 25.e.), we calculate burial efficiency to be 85 + 20%. 
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Future work should seek to better constrain these burial efficiencies with addi- 
tional terrestrial and marine samples. Nevertheless, despite the remaining uncer- 
tainty, these high burial efficiencies'* are consistent with the high sedimentation 
rate and low temperature setting. The long-term burial of POC delivered to sites 
deeper in the Beaufort Sea” still remains to be assessed to provide a complete 
picture of source-to-sink carbon transfers. 
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Extended Data Figure 1 | The location of river depth profiles collected from _ elevation model GMTED 15 arcsec with upstream sediment source catchment 
the Mackenzie River. Three locations along the Mackenzie River were areas delineated by flow accumulation and flow direction outputs from the 
sampled (circles) at the delta (black), Tsiigehtchic (grey) and Norman Wells digital elevation model (dotted lines). The Great Slave Lake is indicated 
(white) in addition to the major tributaries, the Liard River (red diamond), upstream of the Liard confluence and acts as an effective sediment trap in the 
Arctic Red River (light blue square) and Peel River (dark blue square). The basin'®. b, Permafrost zone coverage in the upstream areas of the basin”. 
location of the sediment core MTW01 from the Mackenzie trough is White rectangle shows the sample locations near the Mackenzie delta displayed 
shown (triangle). a, Major river channels (black lines) overlain on digital in c, overlain on LANDSAT imagery. 
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Extended Data Figure 2 | Source of particulate organic carbon in the 
Mackenzie River basin. a, Radiocarbon content (reported as F,,5q) as a 
function of the stable isotope ratio of organic carbon (8° Corg) of river 
sediments for the Mackenzie River (circles) and its major tributaries (diamonds 
and squares) for suspended load samples from river depth profiles (filled 
symbols) and river bed materials (open symbols). Dashed lines and shaded 
regions show hypothetical compositions produced by mixing rock-derived 
POC petro” and POCyiosphere’» b, Fmoa a8 a function of Al/OC;ota- High 
Al/OCiotai and low Fog correspond to the petrogenic source of POC 

(POC, ¢t:.)- Linear trends are shown for the Peel and Arctic Red rivers (blue, 
y=(-15+ 0.3 X 10 °)x + (0.85 + 0.11), 7° = 0.85, P< 0.02), the Mackenzie 
River at delta (black, y = (—5.9 + 0.5 X 10 °)x + (0.65 + 0.03), ° = 0.95, 


P<0.001), and the Mackenzie and Liard rivers (grey, 

y=(-2.3+0.3X 10%) x + (0.56 + 0.03), 7 = 0.82, P< 0.001). The 
intercepts at Fog = 0 for POC, ctr. are given with uncertainty (+1 s.d.) and are 
different for each sub-basin, reflecting the distribution of organic carbon-rich 
rocks in the Mackenzie mountains”. c, Measured 8° Cong versus those 
predicted by the endmember mixing model (EMM-predicted) (equations (1) 
and (2); Methods). The good agreement between measured and predicted 
values within the uncertainty on the measurements suggests that mixing of 
POC etro and POCpiosphere Can explain the first-order variability in 3° Corg 
values between catchments and between suspended load and river bed 
materials. 
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Extended Data Figure 3 | Radiocarbon age of biospheric particulate organic _ each sampling location as a narrow rectangle. The distribution of published 


carbon in the Mackenzie River derived from the mixing analysis. The basal peat sample '“C ages for the Mackenzie River basin’* is shown as 
number of POC, iosphere Measurements of a given range of “Cages is shown for wide rectangles. 
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Extended Data Figure 4 | River particulate organic carbon in the Mackenzie basin. Organic carbon concentration as a function of Al/Si, which is a function of 
grain size in the Mackenzie River basin”. Analytical errors (2 s.d.) are shown as grey lines if larger than the point size. 
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Extended Data Figure 5 | Stable isotope composition and nitrogen to 
organic carbon ratio of terrestrial and marine sediments. Suspended 
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(grey) and Norman Wells (white) are shown. Marine sediment samples 
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grey >63 ,1m) are shown with published surface sediment samples from the 
Beaufort Sea (white triangles) and Davis Strait (black squares)”. The terrestrial 
POC field shows an indicative range of values measured in the Mackenzie 
River. The marine OC field shows values expected for Arctic Ocean marine OC. 
Analytical errors (2 s.d.) are shown as grey lines if larger than the point size. 
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Extended Data Table 1 | River suspended sediment and bed material samples from the Mackenzie basin in 2009-2011 


Sample ID River Location Date Lat. Long, Typet DOSS ALM lO) AoC N/OC wa Fon! Publication 

CAN10_28 Mackenzie Delta 09/09/2010 68.4092 134.0805 SL 19 513 56947 0.21 142 + O11 -26.7 0.068 + 0.005 0419 + 0.002 SUERC-43077 
CANI0_29 Mackenzie Delta 09/09/2010 68.4092 134.0805 SL Le 275 77165 0.32 149 + 0,12 -26.6 0.085 + 0.007 0.377 + 0.002 SUERC-43078 
CANI10_31 Mackenzie Delta 09/09/2010 68.4092 134.0805 SL 6 251 79441 0.34 150 + 012 -26.6 0.086 + 0.007 0.363 + 0.002 SUERC-43079 
CAN10_32 Mackenzie Delta 09/09/2010 68.4092 134.0805 SL 0 162 90608 0.40 1.42 + O11 -26.6 0.099 + 0.008 0.307 + 0.002 SUERC-43080 
CANI1_87 Mackenzie Delta 13/06/2011 68.4092 134.0805 SL 20 848 43833 0.16 2.71 + 0.22 -26.5 0.045 + 0.004 0.568 + 0.003 SUERC-43100 
CANI1_88 Mackenzie Delta 13/06/2011 68.4092 134.0805 SL 15 850 42155 0.15 1.00 + 0.08 -26.6 0.065 + 0.005 0.366 + 0.002 SUERC-43101 
CANI1_89 Mackenzie Delta 13/06/2011 68.4092 134.0805 SL 8 240 62214 0.25 154 + 0.12 -26.5 0.077. + 0.006 0.401 oo 0.002 SUERC-43102 
CANI1_90 Mackenzie Delta 13/06/2011 68.4092 134.0805 SL 0 119 74070 0.31 1.65 + 0.13 -26.2 0.087 + 0.007 0.365 a 0.002 SUERC-43103 
CANI10_38 Mackenzie Delta 09/09/2010 68.4092 134.0805 BM Thalweg - 32258 0.10 0.37 + 0.03 -28.0 0.076 + 0,006 0.120 + 0.001 SUERC-46907 
CANO09_55 Mackenzie Tsiigehtchic 23/07/2009 67.4530 133.7405 SL 0 - - - 1.50 0.03 -26.6 0.104 + 0.004 0.2170 + 0.001 OS-78930 

CANI0_16 Mackenzie Tsiigehtchic 07/09/2010 67.4530 133.7405 BM Thalweg - 27913 0.08 0.16 + 0.01 -28.0 0.063 + 0.005 0.162 + 0.001 SUERC-46902 
CAN10_10 Mackenzie Tsiigehtchic 07/09/2010 67.4530 133.7405 SL 23 255 80393 0.34 1.41 + O11 -26.6 0.090 + 0.007 0.342 + 0.002 SUERC-43071 
CANI0_15 Mackenzie Tsiigehtchic 07/09/2010 67.4530 133.7405 SL 0 231 83833 0.36 1.42 + O11 -26.6 0.094 + 0.008 0.354 + 0.002 SUERC-43072 
CANI1_65 Mackenzie Tsiigehtchic 11/06/2011 67.4530 133.7405 SL 13 941 39509 0.13 1.62 + 0.13 -26.4 0.043 + 0.003 0.570 + 0.003 SUERC-43091 
CANI1_66 Mackenzie Tsiigehtchic 11/06/2011 67.4530 133.7405 SL 10 445 50607 0.18 140 + O11 -26.4 0.066 + 0.005 0.444 + 0.002 SUERC-43092 
CANI1_67 Mackenzie Tsiigehtchic 11/06/2011 67.4530 133.7405 SL 5 322 56614 0.22 1.43 + O11 -26.6 0.077 + 0.006 0.456 + 0.002 SUERC-43093 
CANI1_68 Mackenzie Tsiigehtchic 11/06/2011 67.4530 133.7405 SL 0 291 59948 0.24 154 + 0,12 -26.5 0.076 + 0.006 0.451 + 0.002 SUERC-43097 
CANI0_39 Mackenzie Norman Wells 10/09/2010 65.2650 126.7594 SL 3.15 168 65521 0.26 1.31 + 0.10 -26.7 0.087 + 0.007 0417 + 0.002 SUERC-43082 
CAN10_40 Mackenzie Norman Wells 10/09/2010 67.4530 126.7594 SL i} 136 73513 0.31 139 + O11 -26.6 0.091 + 0.007 0.411 + 0.002 SUERC-43083 
CANI0_50 Liard Fort Simpson 13/09/2010 61.8234 121.2976 BM Thalweg - 26796 0.07 0.14 + 0.01 -28.2 0.070 + 0.006 0.155 as 0.001 SUERC-46906 
CANI0_46 Liard Fort Simpson 13/09/2010 61.8234 121.2976 SL 4.8 492 38265 0.12 149 + 0.12 -26.6 0.045 + 0.004 0.633 as 0.003 SUERC-43086 
CANI0_49 Liard Fort Simpson 13/09/2010 61.8234 121.2976 SL 0 79 73830 0.29 2.00 + 0.16 -26.6 0.077. + 0.006 0.445 + 0.002 SUERC-43087 
CANI1_03 Liard Fort Simpson 04/06/2011 61.8234 121.2976 SL 6.5 490 56524 0.21 1.43 + O11 -26.4 0.075 + 0.006 0.481 + 0.002 SUERC-43088 
CANI1_05 Liard Fort Simpson 04/06/2011 61.8234 121.2976 SL 3.5 542 55058 0.21 1.51 + 0.12 -26.4 0.069 + 0.006 0.465 = 0.002 SUERC-43089 
CANI1_07 Liard Fort Simpson 04/06/2011 61.8234 121.2976 SL 0 438 58509 0.23 147 + 0.12 -26.4 0.073 + 0.006 0.4520 + 0.002 SUERC-43090 
CANI10_07 Peel Fort McPherson 07/09/2010 67.3313 134.8656 BM Thalweg - 34205 0.10 0.75 + 0.06 -28.0 0.068 + 0.005 0.133 + 0.001 SUERC-46905 
CAN10_03 Peel Fort McPherson 07/09/2010 67.3313 134.8656 SL 8.5 250 58694 0.20 2.00 + 0.16 -26.8 0.071 + 0.006 0.383 + 0.002 SUERC-43069 
CANI0_06 Peel Fort McPherson 07/09/2010 67.3313 134.8656 SL 0 101 76053 0.29 2.24 +4 0.18 -26.8 0.080 + 0.006 0.284 + 0.002 SUERC-43070 
CANI1_77 Peel Fort McPherson 11/06/2011 67.3313 134.8656 SL 6 325 58519 0.21 2.27 + 0.18 -26.8 0.072 + 0.006 0.480 + 0.002 SUERC-43098 
CANI1_79 Peel Fort McPherson 11/06/2011 67.3313 134.8656 SL 0 146 72153 0.28 1.85 + 0.15 -26.6 0.085 + 0.007 0.315 = 0.002 SUERC-43099 
CANI0_17 Arctic Red Tsiigehtchic 07/09/2010 67.4394 133.7529 SL 6 123 73830 0.31 2:17. + 0.17 -26.8 0.080 + 0,006 0.299 + 0.002 SUERC-43073 
CANI0_19 Arctic Red Tsiigehtchic 07/09/2010 67.4394 133.7529 SL i} 123 71608 0.30 1.95 + 0.16 -26.8 0.083 + 0.007 0.291 + 0.002 SUERC-43076 


*River sample type: SL, suspended load; BM, bed material, collected at the thalweg (deepest part of the river channel cross-section). 


’SSC, suspended sediment concentration. 
“Aluminium to silicon ratio?®. 


“Fimoa from radiocarbon activity. 
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Extended Data Table 2 | Sediment samples from the offshore core MTWO1 


14, 
4, C Age ' . 
Depth Dated c 5 Chora 38°C AR Calibrated iain: Tees Date Grain Ad 5° Cow Al 
Interval (m) iiaiterial Age (permil) fiormulised ne age (cal. plus o é ion code Reported size [OC roiat] (%) (permil) N/OCio1a1 (ppm) 
cos (yrs) ee (yrs)! “ y yrs BP)‘ # ep fraction ct PP 
Mixed 
Benthic 1860 335+ OS- 
1A 0-0.2 Foraminifera £25 -14 2244189 85 1461 1548 1353: 103002 21/05/2013 >63um 1.34 + 0.11 -25.7 - - 
<63u.m 1.72 + 0.14 -26.1 0.09 a 0.01 89010 
Mixed 
7.62- Benthic 4280 335+ OS- 
6B 8.12 Foraminifera +20 -1.6 4661487 85 4462 4590 4333 103185 24/05/2013 >63um 1.40 + 0.11 -25.8 - - 
<63y.m 1.58 + 0.13 -26.1 0.08 Ea 0.01 89127 
Mixed 
16.92- Benthic 7080 335+ OS- 
12B 17.42 Foraminifera +35 -0.9 7472492 85 7612 7689 7517 95351 22/05/2012 >63um 1.19 + 0.10 -25.9 0.11 Ea 0.01 80960 
<63p.m 1.52 + 0.12 -26.0 0.09 oe 0.01 87687 
Mixed 
20.73- Benthic 8490 335+ OS- 
15B 21.09 Foraminifera +70 -0.9 88824110 85 9183 9308 9027 95606 04/06/2012 >63m 1.09 = 0.09 -25.9 - - 
<63.4m 1.50 + 0.12 -26.1 0.09 £ 0.01 - 


*See Methods. 

Reservoir age (see Methods). 

“Calibrated age in calibrated years before present, based on MARINE13 data set in CALIB v7.1 (Methods). 
‘Organic carbon concentration for the sediment samples. 
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Extended Data Table 3 | River bank samples from the Mackenzie River in 2009 


Sample ID Grain size fraction River Location Date Lat. Long. Type [OCrotai] (%) (sia Oe (permil) N/OCtotat Frod Publication Code 
CANO09-54 150-250um Mackenzie Tsiigehtchic 23/07/2009 67.45384 —-133.70741 ~—- Flooddep. = 2.02 + ~=O.11 -26.7 0.052 + 0.003 0.672 + 0.003 OS-78930 
CANO09-54 63-150um. Mackenzie Tsiigehtchic 23/07/2009 67.45384 —-133.70741 Flood dep. 1.01 + 0.03 -27.0 0.069 + 0.002 0444 + 0.002 OS-78929 
CAN09-54 <63um Mackenzie Tsiigehtchic 23/07/2009 67.45384 —:133.70741 Flood dep. 1.06 + 0.05 -27.0 0.098 + 0.005 0.371 + 0.001 OS-78928 
CAN09-54 Bulk Mackenzie Tsiigehtchic 23/07/2009 67.45384 ~—-133.70741 ~—- Flooddep. = (0.99 + _~—0.00 -26.9 0.085 + 0.002 0455 + 0.002 OS-78927 
CANO09-12 150-250nm Liard Fort Simpson 16/07/2009 61.84457—-121.31625. Flooddep. 17.47. + 1.24 -26.6 0.044 + 0.003 0.839 + 0.003 OS-79575 
CAN09-12 63-150um Liard Fort Simpson 16/07/2009 61.84457—-:121.31625. Flooddep. 0.84 + 0.01 -26.6 0.087 + 0.001 0.403 + 0.002 OS-79574 
CANO09-12 <63um Liard Fort Simpson 16/07/2009 61.84457 —-121.31625. Flooddep. 0.66 39+ 0.02 -26.6 0.132 + 0.006 0.364 + 0.002 OS-79573 
CAN09-12 Bulk Liard Fort Simpson 16/07/2009 61.84457 —-:121.31625 Flood dep. 1.04 + 0.06 -26.7 0.069 + 0.004 0.523 + 0.002 OS-79576 
CAN09-42 <63um Peel Fort McPherson — 22/07/2009 —-67.33189 ——-134.86912 _ Flood dep. 1.63 + 0.10 -27.1 0.089 + 0.005 0.328 + 0.002 OS-78922 
CAN09-42 Bulk Peel Fort McPherson 22/07/2009 _67.33189 _134.86912 _ Flooddep. 2.01 — + _—i0.04 -27.1 0.082 + 0.002 0440 + 0.002 OS-78921 
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Viral-genetic tracing of the input-output 
organization of a central noradrenaline circuit 


Lindsay A. Schwarz'*, Kazunari Miyamichi'*, Xiaojing J. Gao‘, Kevin T. Beier'*, Brandon Weissbourd', Katherine E. DeLoach', 
Jing Ren!, Sandy Ibanes*, Robert C. Malenka®, Eric J. Kremer*? & Liqun Luo! 


Deciphering how neural circuits are anatomically organized with 
regard to input and output is instrumental in understanding how 
the brain processes information. For example, locus coeruleus nor- 
adrenaline (also known as norepinephrine) (LC-NE) neurons 
receive input from and send output to broad regions of the brain 
and spinal cord, and regulate diverse functions including arousal, 
attention, mood and sensory gating’ *. However, it is unclear how 
LC-NE neurons divide up their brain-wide projection patterns and 
whether different LC-NE neurons receive differential input. Here 
we developed a set of viral-genetic tools to quantitatively analyse 
the input-output relationship of neural circuits, and applied these 
tools to dissect the LC-NE circuit in mice. Rabies-virus-based input 
mapping indicated that LC-NE neurons receive convergent syn- 
aptic input from many regions previously identified as sending 
axons to the locus coeruleus, as well as from newly identified pre- 
synaptic partners, including cerebellar Purkinje cells. The ‘tracing 
the relationship between input and output’ method (or TRIO 
method) enables trans-synaptic input tracing from specific subsets 
of neurons based on their projection and cell type. We found that 
LC-NE neurons projecting to diverse output regions receive mostly 
similar input. Projection-based viral labelling revealed that LC-NE 
neurons projecting to one output region also project to all brain 
regions we examined. Thus, the LC-NE circuit overall integrates 
information from, and broadcasts to, many brain regions, consist- 
ent with its primary role in regulating brain states. At the same 
time, we uncovered several levels of specificity in certain LC-NE 
sub-circuits. These tools for mapping output architecture and 
input-output relationship are applicable to other neuronal circuits 
and organisms. More broadly, our viral-genetic approaches pro- 
vide an efficient intersectional means to target neuronal popula- 
tions based on cell type and projection pattern. 

Figure la, b illustrates two extreme connectivity models for input- 
output relationships of projection neurons. At one extreme, neurons 
from each input A region connect to a subtype of B neurons that send 
output to a unique C region (Fig. 1a); this model segregates informa- 
tion into discrete pathways. At the other extreme, B neurons are homo- 
geneous in receiving input from all A regions and sending output to all 
C regions with a common probability distribution (Fig. 1b); this model 
allows an overall indiscriminate integration and broadcast of informa- 
tion. The input-output relationship for most neural circuits is not 
known (Supplementary Note 1). 

To determine the input-output organization of neural circuits, we 
have developed a method we have termed TRIO (for tracing the rela- 
tionship between input and output) and cell-type-specific TRIO 
(cTRIO). TRIO identifies neurons in A regions that synapse onto B 
neurons projecting to a specific C region. AB connections are 
determined by rabies-virus-mediated retrograde trans-synaptic tra- 
cing’, which relies on monosynaptic spread of EnvA-pseudotyped, 


glycoprotein deleted and GFP-expressing rabies viruses (RVdG 
hereafter) from starter cells in the B region. Starter cells express rabies 
glycoprotein (G) and the TVA receptor for EnvA fused with mCherry 
(TC)!°", which allow RVdG infection and complementation, enabling 
trans-synaptic spread to presynaptic neurons in A regions. In TRIO 
(Fig. 1c), expression of TVA-mCherry fusion and rabies glycoprotein 
depends on Cre/loxP-mediated recombination from adeno-associated 
viruses (AAVs) injected in region B. Cre is delivered at a specific C 
region by canine adenovirus type 2 (CAV hereafter) that efficiently 
transduces axon terminals'*’*. Thus, TRIO does not distinguish 
between different cell types within region B. In cIRIO (Fig. 1d), 
TRIO is performed in transgenic mice in which Cre recombinase is 
expressed in a specific cell type in the B region. Expression of the TVA- 
mCherry fusion and rabies glycoprotein delivered in the B region 
depends on Flp/FRT-based recombination and Flp recombinase from 
CAV-FLEx*”-Flp, which is injected at a specific C region. Thus, only B 
neurons that express Cre and project to a specific C region can become 
starter cells for RVdG-mediated trans-synaptic tracing. 

As a proof-of-principle, we applied TRIO and cTRIO to the mouse 
motor cortex (Fig. le). In TRIO experiments, we injected CAV-Cre in 
contralateral motor cortex and AA V-FLEx’-TC/G into motor cortex 
of wild-type mice; this resulted in starter cells in motor cortex layers 
2/3 (L2/3), L5, (Fig. 1f, left; arrowheads) and L6. In cTRIO experi- 
ments, we injected CA V-FLEx*”-Fip in contralateral motor cortex or 
medulla of retinol binding protein 4 (Rbp4)-Cre mice to target intra- 
cortical- or subcortical-projecting L5 neurons, respectively, and AAV- 
FLEx®®?-TC/G into motor cortex; this resulted in starter cells that were 
restricted to motor cortex L5 (Fig. 1f, middle and right), as predicted by 
the Rbp4-Cre expression pattern’. For comparison, we also performed 
trans-synaptic tracing in motor cortex of Rbp4-Cre mice, where starter 
cells were not selected based on output projections. A large majority of 
the GFP* input neurons to the motor cortex were from the cortex or 
thalamus (Fig. 1f, bottom). Interestingly, contralateral-motor-cortex- 
projecting L5 neurons received proportionally more input from the 
cortex, whereas medulla-projecting L5 neurons received more input 
from the thalamus. Almost all other inputs came from the globus 
pallidus, a recently identified direct input to the cortex’® (Fig. 1g; 
Supplementary Table 1). Control experiments indicated that rabies- 
mediated labelling of most local and all long-range input neurons 
were dependent on CAV-Cre for TRIO, and CAV-FLEx*-Flp and 
Rbp4-Cre for cTRIO (Extended Data Fig. 1). These experiments 
demonstrated that cTRIO can restrict starter cells to a more specific 
population than TRIO, and suggested that subcortical- and callosal- 
projecting L5 neurons receive differential thalamic versus cortical 
input. TRIO also identified presynaptic partners of callosal- or stria- 
tal-projecting motor cortex neurons in rat (Extended Data Fig. 2). 

As the precision of TRIO/cTRIO analysis is defined by CAV- 
mediated transduction from axons and presynaptic terminals, we 
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Figure 1 | Strategy and proof-of-principle of TRIO and cTRIO. 

a, b, Schematic of two extreme connection patterns of region B neurons with 
inputs from A regions and outputs to C regions. ¢, d, Strategies for trans- 
synaptic input tracing from B neurons based on their outputs. TRIO (c) does 
not distinguish between region B cell types projecting to the selected C region 
(two different cell types are outlined in grey and blue). cTRIO (d) avoids 
labelling promiscuous projections from Cre" cells (blue). Open and filled 
triangles, incompatible loxP sites; open and filled half circles, incompatible FRT 
sites. e, Schematic of TRIO and cTRIO in mouse motor cortex. CAV was 
injected into contralateral motor cortex or medulla along with AAVs 
expressing Cre- or Flp-dependent TVA-mCherry (TC)/rabies glycoprotein 
(G) into motor cortex, followed by RVdG. Experiments were performed in 
wild-type (TRIO) or Rbp4-Cre (cTRIO) mice. f, Example coronal sections of 
motor cortex starter cells in TRIO and cTRIO. cMC, contralateral motor cortex; 
Me, medulla. Cortical layers are separated by dotted lines based on the DAPI 
(4’,6-diamidino-2-phenylindole) stain (blue). Starter cells (yellow, a subset 


characterized its spread by injecting CA V-Cre and retrobeads into the 
Ail4 Cre-reporter mice’® in the piriform cortex at varied distances 
from the mitral cell axon layer and quantifying labelled mitral cells. 
We found that CAV spread mostly within 200 Lm from the injection 
site and could also infect axons in passage (Extended Data Fig. 3; 
Supplementary Note 2). CAV tropism was examined by injecting 
CAV-Cre into five diverse brain regions of Ail4 mice. Neurons known 
to project to these brain regions, some of which are >1 cm away from 
the injection sites, were efficiently labelled (Extended Data Fig. 4). 
Thus, CAV can infect axons and terminals of diverse neuronal types 
across long distances. 

We next applied our viral-genetic tools to the noradrenaline neu- 
rons in the locus coeruleus, a small bilateral nucleus in the brainstem 
(~1,500 noradrenaline neurons per locus coeruleus) that collectively 
project axons throughout the brain’~. It is unclear whether different 
LC-NE neurons receive differential input, how LC-NE neurons divide 


Fraction of total input neurons 


indicated by arrowheads) can be distinguished from input cells labelled only 
with GFP from RVdG (green). TC* cells in motor cortex spanned layers 2/3 
and 5 for TRIO (left), but were restricted to L5 with cTRIO. Bottom inset, 
example images of input neurons from the somatosensory cortex (SC) and 
ventral anterior thalamus (VA), derived from larger composites (see Methods). 
g, Average fraction of total input neurons in Rbp4-Cre-based input tracing and 
cTRIO of motor cortex L5 pyramidal neurons. Values represent the average 
fraction of input in each category (n = 4 animals for Rbp4-Cre input tracing and 
contralateral motor cortex cTRIO; n = 3 animals for medulla cTRIO). Two- 
way ANOVA determined that inputs to Rbp4-Cre* motor cortex starter cells 
generated by input tracing or cCTRIO (C = cMC or C = Me) are significantly 
different in brain regions from which they receive input (interaction 
P<0.0001). One-way ANOVA and post hoc Tukey’s multiple comparison 
tested the significance within each input region. Error bars, s.e.m. **P < 0.01; 
***P < 0,001. Scale bars, 250 tm (f, middle row), 100 um (f, bottom row). 


up their brain-wide projection patterns’’, and what their input-output 
relationships are. We first identified synaptic inputs received by 
these neurons using RVdG-mediated retrograde trans-synaptic tracing 
in dopamine f-hydroxylase (Dbh)-Cre mice (Fig. 2a), where Cre- 
dependent TVA-mCherry fusion and rabies glycoprotein expression 
was restricted to LC-NE neurons that express the noradrenaline bio- 
synthetic enzyme Dbh (Fig. 2b). Control experiments validated that Cre 
recombination occurred almost exclusively in locus coeruleus neurons 
expressing tyrosine hydroxylase (TH), another LC-NE neuron marker, 
and that long-range trans-synaptic tracing depended on Dbh-Cre and 
AAV-delivered rabies glycoprotein (Extended Data Fig. 5). 

We counted all input neurons to LC-NE starter cells from the 
anterior forebrain to posterior medulla (Fig. 2c—h) except in coronal 
sections immediately surrounding the locus coeruleus, as non-specific 
viral labelling of neurons can occur locally at the AAV/RVdG 
injection site (Extended Data Fig. 5c-f). We assigned each input 
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Figure 2 | Presynaptic input to LC-NE neurons revealed by rabies-mediated 
trans-synaptic tracing. a, Strategy for trans-synaptic tracing of input to 
LC-NE neurons. b, Coronal section of a mouse brain at the locus coeruleus 
(dotted square) stained with DAPI (blue). A region within the square is 
magnified in the inset. LC-NE starter cells (yellow) can be distinguished from 
cells receiving only TC from AAV (red) or only GFP from RVdG (green) at the 
injection site. c-g, Coronal sections showing representative input neurons in 
diverse brain regions. h, Sagittal section of the cerebellum showing trans- 
synaptically labelled Purkinje cells. Images in b-h were derived from larger 
composites. i, Schematic summary of brain regions that provide the largest 
average fractional inputs to LC-NE neurons (n = 9 animals). Scale bars, 1 mm 
(b), 50 um (b, inset; c-h). BNST, bed nucleus of the stria terminalis; CeA, 
central amygdala; DCN, deep cerebellar nuclei; IRN, intermediate reticular 
nucleus; LC, locus coeruleus; LH, lateral hypothalamus; LRN, lateral reticular 
nucleus; MRN, midbrain reticular nucleus; PAG, periaqueductal grey; PC, 
Purkinje cells; PGRN/GRN, paragigantocellular/gigantocelluar nucleus; POA, 
preoptic area; PRN, pontine reticular nucleus; PVH, paraventricular 
hypothalamic nucleus; SuC, superior colliculus; SVN, spinal vestibular nucleus; 
ZI, zona incerta. 


neuron to one of 111 brain regions according to the Allen Brain Atlas 
(http://mouse.brain-map.org/static/atlas) to categorize brain regions 
ipsi- or contralateral to the injected locus coeruleus (Supplementary 
Table 2). Regions that contributed more than 1% of total input from 
nine Dbh-Cre tracing brains are summarized in Fig. 2i. Although most 
brain regions we identified as presynaptic to LC-NE neurons are con- 
sistent with previous retrograde tracing studies®*, our experiment vali- 
dated that these neurons directly synapse onto LC-NE neurons rather 
than just projecting axons to the locus coeruleus. We also found that 
deep cerebellar nuclei and cerebellar Purkinje cells contributed a not- 
able fraction of direct synaptic input to LC-NE neurons (Fig. 2h, i), 
which (to our knowledge) has not been previously reported. Labelled 
Purkinje cells were enriched in the ipsilateral medial zones throughout 
the cerebellum, at distances up to 2.5mm away from the locus coer- 
uleus (Extended Data Fig. 6a). Consistent with a direct connection 
between Purkinje cells and LC-NE neurons, we found that an inhibitory 
postsynaptic marker, gephyrin, was present in TH’ LC-NE dendrites 
apposing GABAergic Purkinje cell axons (Extended Data Fig. 6b, c). 
We next applied TRIO and cTRIO to test if populations of LC-NE 
neurons, defined by their output targets, received distinct input. We 
selected five diverse brain regions known to receive LC-NE projec- 
tions: the olfactory bulb, auditory cortex, hippocampus, cerebellum 
and medulla (Fig. 3a). CAV-Cre injection into these regions in Ail4 
mice confirmed labelling of noradrenaline neurons throughout the 
locus coeruleus (Extended Data Fig. 7a). We did not observe signifi- 
cant differences in the spatial distribution along the anterior—posterior 
or medial-lateral axes for LC-NE neurons that projected to these 
brain regions. However, forebrain-projecting LC-NE neurons were 
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Figure 3 | Input-output relationship of LC-NE neurons revealed by TRIO 
and cTRIO. a, Schematic of CAV injections into locus coeruleus output 
regions for TRIO and cTRIO. AC, auditory cortex; Cb, cerebellum; Hi, 
hippocampus; Me, medulla; OB, olfactory bulb. b-d, Average fractional inputs 
in Dbh-Cre-based input tracing (grey, n = 9 animals), TRIO (Hi, purple; 
OB, red; AC, blue; n = 4 animals each), and cTRIO (Cb, green; Me, orange; 
n= 4 animals each). Input neurons were grouped into 16 broader categories. 
Magnified insets highlight the average fraction of input from striatum-like 
amygdala (>98% from the central amygdala) (c) or from Purkinje cells (d) to 
LC-NE neurons that project to the 5 output regions or in Dbh-Cre-based input 
tracing. Error bars, s.e.m. 


more dorsally biased compared to the hindbrain-projecting ones 
(Extended Data Fig. 7b-f), consistent with a previous observation in 
the rat’®. We applied TRIO to olfactory bulb, auditory cortex and 
hippocampus, and cTRIO to cerebellum and medulla, as locus coer- 
uleus projections to the former group predominately came from TH* 
neurons, whereas the latter group contained TH neurons (Extended 
Data Fig. 7a). Control experiments indicated that the labelling of input 
neurons depended on CAV-Cre in the case of TRIO (Extended Data 
Fig. 5c, e), and on both Dbh-Cre and CAV-FLEx'’*-Flp in the case of 
cTRIO (Extended Data Fig. 8). 

We analysed inputs for the TRIO and cTRIO experiments analogous 
to Dbh-Cre-based input tracing (Supplementary Table 2). We observed 
that LC-NE neurons received inputs from all input regions regardless of 
their diverse output, with a grossly similar proportional distribution 
(Fig. 3b). These data suggest that the LC-NE circuit is largely indis- 
criminate with respect to its input-output relationship. However, 
region-by-region one-way ANOVA (Supplementary Table 3, top) 
rejected the overall null hypothesis that input distribution is inde- 
pendent of output conditions (combined P= 0.002), indicating 
that the input-output relationships were not entirely homogeneous. 
Of the individual input that exhibited the smallest P values (Supple- 
mentary Table 3, bottom), LC-NE neurons projecting to the medulla 
received less input from the central amygdala (Fig. 3c). In addition, 
the fraction of Purkinje cell inputs in Dbh-Cre based tracing was 
higher than any of the TRIO/cTRIO conditions (Fig. 3d), suggesting 
that Purkinje cells contribute input to an LC-NE population that do 
not project axons extensively to any of the output sites we examined. 

The largely indiscriminate input-output relationship revealed by 
TRIO can in principle be accounted for by input convergence 
(Fig. 1b, left), output divergence (Fig. 1b, right), or both. A simulation 
analysis of the two sparsest input tracing samples (Supplementary 
Table 2) suggested that individual LC-NE neurons must receive input 
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Figure 4 | Broad output divergence of LC-NE neurons revealed by projection- 
based viral-genetic labelling. a, In this strategy, neurons in region B projecting 
to the C region where CAV-Cre is delivered are labelled, including their 
collaterals to other output regions (for example, blue neurons to C2). b, In this 
strategy, only Cre" neurons in region B projecting to the C region where CAV- 
FLEx°*?-Flp is delivered are labelled. c, Schematic for data in (d). In this 
example, CAV was injected in the olfactory bulb and TC was injected in the 
locus coeruleus. TC’ LC-NE axons were imaged in the designated brain 
regions. All TC* axons were co-stained with anti-noradrenaline transporter 
(NET; inset), confirming their noradrenaline identity. d, Average normalized 
fraction of TC’ LC-NE axons in each brain region when CAV was injected into 
four output sites, or Cre-dependent TC was injected directly into the locus 


from more than 15 or 9 brain regions, respectively (Extended Data 
Fig. 9). This is most likely a lower bound as rabies tracing efficiency is 
far from 100%. However, such extensive integration is not entirely 
homogenous (Supplementary Table 4). Thus, individual LC-NE neu- 
rons integrate inputs from many regions, yet exhibit heterogeneity 
with respect to brain regions from which they receive input. 

We next explored the output architecture of LC-NE neurons 
(Fig. la, b, right). Previous dual-retrograde-tracer experiments indi- 
cated that individual LC-NE neurons could project to two brain 
regions far apart’ *’, but did not examine collateralization between 
more than two output regions in a given experiment. We devised a 
general method for tracing output divergence of specific neuronal 
populations based on their projection to one output site (Fig. 4a, b). 
We found that populations of LC-NE neurons projecting to the olfact- 
ory bulb, auditory cortex, hippocampus or medulla also projected to all 
seven additional brain regions analysed (Fig. 4c, d and Extended Data 
Fig. 10). Thus, the output of LC-NE neuronal populations is highly 
divergent, resembling the broadcast model (Fig. 1b, right) much more 
than the discrete output model (Fig. 1a, right). We nevertheless found 
a general trend of increased axon density in the output region where 
labelling was initiated compared to labelling initiated from the locus 
coeruleus, with the bias from olfactory bulb- or medulla-initiated 
labelling reaching statistical significance (Fig. 4e). This suggests that 
LC-NE neurons projecting to the olfactory bulb or medulla contain 
populations with biased output to these regions, consistent with the 
observation that LC-NE neurons projecting to these regions have a 
biased distribution along the dorsoventral axis in the locus coeruleus 
(Extended Data Fig. 7a). 


coeruleus of Dbh-Cre mice (colour code on top right). LC: n = 4 animals 
(Dbh-Cre); Hi: n = 4 animals (Dbh-Cre); AC: n = 4 animals (2 wild-type, 

2 Dbh-Cre); OB: n = 5 animals (3 wild-type, 2 Dbh-Cre); Me: n = 4 animals 
(Dbh-Cre). The average number of LC-NE neurons labelled in each condition 
was: 855 + 102 (mean + s.e.m, in locus coeruleus, n = 4 animals); 235 + 35 
(Hi, n = 4 animals); 80 + 31 (OB, n= 5 animals); 114 + 31 (AC, n=4 
animals); 202 + 63 (Me, n = 4 animals). e, Comparison of the fraction of Tc* 
axons at CAV injection sites between projection-based and direct locus 
coeruleus labelling methods. Unpaired two-tail t-tests. *P < 0.05. Error bars, 
s.e.m. Scale bar, 10 jm. Abbreviations: AC, auditory cortex; Cb, cerebellum; 
CC, cingulate cortex; Hi, hippocampus; Hy, hypothalamus; LC, locus coeruleus; 
Me, medulla; OB, olfactory bulb; SC, somatosensory cortex. 


Our study provides the first whole-brain quantitative analysis of 
synaptic input onto LC-NE neurons (Figs 2i and 3b-d and 
Supplementary Table 2). Although the total input encompasses brain 
regions that control cognitive, autonomic, endocrine and somatic 
motor activities, LC-NE neurons receive abundant input from 
motor-related nuclei in the midbrain, pons, medulla and cerebellum 
(Fig. 3b). Our viral-genetic tools also revealed a highly extensive output 
divergence (Fig. 4d). Together with input convergence, this probably 
explains the largely indiscriminate input-output relationship of LC- 
NE neurons (Fig. 3b). This property fits well with a primary function of 
the LC-NE neurons in regulating states of the entire brain during sleep/ 
wake cycles and arousal*”**». 

Despite the overall integrative nature, however, our data also 
revealed specificity in the input-output relationship of LC-NE 
sub-circuits. Medulla-projecting LC-NE neurons receive dispropor- 
tionally smaller input from the central amygdala than LC-NE neu- 
rons projecting to other regions (Fig. 3b, c). Input from the central 
amygdala to the locus coeruleus is an important component for 
initiating stress response”®. Our observation implies that modulation 
of medulla by LC-NE neurons is preferentially immune to this type 
of stress input. Although our output studies demonstrated the broad 
projection pattern of LC-NE neurons, they also highlighted specifi- 
cities both in regard to biased cell body distribution within the locus 
coeruleus (Extended Data Fig. 7f) and biased projections (Fig. 4e). 
The existence of such input-output specificity, along with differ- 
ential distribution of adrenergic receptors in target neuronal popu- 
lations*, enables the LC-NE circuit to selectively modulate specific 
targets. 
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The viral-genetic tools we described here can be applied to other 
circuits in the mammalian brain, such as motor cortex (Fig. 1 and 
Extended Data Fig. 2). TRIO and particularly cTRIO have extended 
previous projection-selective targeting methods*”” for analysing 
complex circuits in the central nervous system. Furthermore, inter- 
secting projection and cell type using CAV-FLEx’*”-Flp and numer- 
ous Cre transgenic mice can refine genetic access to specific 
populations of neurons to record and functionally manipulate their 
activity’. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Animals. Dbh-Cre mice*! and Rbp4-Cre transgenic mice’ were obtained from the 
Mutant Mouse Regional Resource Center. Purkinje cell protein 2 (Pcp2)-Cre** and 
the ROSA26“‘"* Cre-dependent tdTomato reporter (Ai14)'° were obtained from 
the Jackson Laboratories. Mice were housed on a 12-h light/dark cycle with food 
and water ad libitum. Transgenic mice were of a mixed genetic background, and 
there was a similar distribution of male and female mice included in all experi- 
ments. Wister Rats were purchased from Japan SLC (Hamamatsu, Japan). All rats 
used for experiments were female. All procedures for mice followed animal care 
guidelines approved by Stanford University’s Administrative Panel on Laboratory 
Animal Care (APLAC). All rat experiments were performed in accordance with 
the animal care and use committee guidelines of the University of Tokyo. 

DNA constructs. CAG-FLEx*”-G and CAG-FLEx”**-TC (same as CAG-FLEx- 
TC") have been described previously". CAG-FLEx'®"-G and CAG-FLEx'®'-TC 
were constructed using standard molecular cloning methods with enzymes from 
New England Biolabs (Ipswich, USA). The custom DNA fragment (389 base pairs) 
shown below was synthesized by DNA 2.0 (Menlo Park, USA). This DNA fragment 
contained restriction enzyme sites and two heterospecific pairs of FRT (shown by 
the underline) and FRT5 (shown by bold italic) in the following “order: 5’-Not!- 
Mlul-KpnI-FRT-FRT5-Sall-AscI-FRT(complementary)-FRT5(complementary)- 
HindIII-Spel-Notl, with the sequence as follows: 5’-GCGGCCGCACGCGTAC 
GTGGTACCGAAGTTCCTATTCCGAAGTTCCTATTCTCTAGAAAGTATA 
GGAACTTCATCAAAATAGGAAGACCAATGCTTCACCATCGACCCGAA 
TTGCCAAGCATCACCATCGACA GAAGTTCCTATTCCGAAGTTCCTAT 
TCTTCAAAAGGTATAGGAACTTOGTCGACAATTGGCGCGCCGAAGTT 
CCTATACTTTCTAGAGAATAGGAACTTCGGAATAGGAACTTCCGTTGG 
GATTCTTCCTATTTTGATCCAAGCATCACCATCGACCCTCTAGTCCAG 
ATCTCACCATCGACCCGAAGTTCCTATACCTTTTGAAGAATAGGAAC 
TTCGGAATAGGAACTTCAAGCTTAATTACTAGTGCGGCCGC. 

This DNA fragment was cloned into the modified pBluescript II SK vector that 
contains only a NotI recognition sequence in the cloning site. We serially inserted 
into this pBluescript the following DNA fragments by using the unique restriction 
sites. (1) HindIII/Spel fragment containing the WPRE and human growth hor- 
mone polyA signal obtained from pAAV-TRE-HTG (Addgene number 27437)”. 
(2) Mlul/KpnI fragment containing the CAG promoter. To make this fragment, 
we sub-cloned PstI-Xmal flanked CAG promoter from pCA-T-int-G (Addgene 
number 36887)" into a modified pBluescript II SK vector that only contains Mlul- 
PstI-XmaI-KpnI recognition sequence in the cloning site. (3) AscI/Sall fragment 
containing coding sequence of G or TC cassette, obtained from pAAV CAG- 
FLEx*?_G (Addgene number 48333) or pAAV CAG-FLEx'**?-TC (Addgene 
number 48332)'°. The assembled cassettes were sub-cloned into pAAV-MCS 
(AAV helper free system, Stratagene, catalogue number 240071-12) using the 
NotI sites. Flp-dependent mCherry expression from CAG-FLEx'®'-TC was 
confirmed by transient transfection into cultured HEK293 cells by using a Flp- 
expressing plasmid (data not shown). To generate CAV-FLEx’””*”-Flp, we first 
constructed the CAV targeting vector pCAV-FLEx*’-Fip. Ascl-Sall flanking 
Flpo coding sequence was PCR amplified by using pBT340 (Addgene number 
52549) as a template and sub-cloned into SalI-Ascl site of a precursor of pAAV 
CAG-FLEx’*”-TC® (ref. 10) that does not contain WPRE-polyA signal. Xmal- 
Notl fragment containing FLEx’”*”-Flp was then subcloned into KpnI-Not! site of 
pCL20c lentivirus vector” by using blunt-end ligation. EcoRI-Spel fragment con- 
taining FLEx'°*’-Flp was then subcloned into EcoRI-EcoRV site of pTCAV-12vk 
(description available upon request to E.J.K.) by using one-sided (Spel site) blunt- 
end ligation, resulting in pCAV-FLEx”””-Flp. 

Virus preparations. All viral procedures followed the Biosafety Guidelines 
approved by the Stanford University Administrative Panel on Laboratory 
Animal Care (A-PLAC), Administrative Panel of Biosafety (APB), and equivalent 
committees of the University of Tokyo. Recombinant AAV vectors (serotype 5 for 
TVA receptor fused with mCherry and serotype 8 for rabies glycoprotein) were 
produced in the Stanford University or University of North Carolina Viral Core. 
The AAV titre was estimated to be 2.6 and 1.3 X 10’? viral particles per ml for 
CAG-FLEx"®"-TC and CAG-FLEx"*’-G, based on quantitative PCR analysis. 
RVdG was prepared as previously described**. The pseudotyped RVdG titer was 
estimated to be ~5 X 10” infectious particles per ml based on serial dilutions of the 
virus stock followed by infection of the 293-TVA800 cell line. The recombinant 
CAV-Cre and CAV-FLEx*”-Flp were generated, expanded and purified by prev- 
iously described methods”. The final titre of CA V-Cre and CAV-FLEx"**”-Flp were 
2.5 X 10’? and 5 X 10" viral particles per ml, respectively. All handling of CAV 
and rabies virus followed procedures approved by Stanford University’s 
Administrative Panel on Biosafety (APB) for biosafety level 2, and the equivalent 
committees of the University of Tokyo (P2/P2A). 

Locus coeruleus trans-synaptic input tracing. Experiments in Fig. 2 were per- 
formed in Dbh-Cre mice at 8-12 weeks of age following procedure as described 
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previously'®**. Mice were anaesthetized with 65 mg per kg ketamine and 13 mg 
per kg xylazine (Vedco/Lloyd Laboratories) via intraperitoneal injection. Then 
~0.5 pl of a 1:1 mixture of AAV8 CAG-FLEx’*?-G and AAV5 CAG-FLEx*?-TC 
were injected into the left locus coeruleus of the mouse using stereotaxic 
equipment (Kopf). The coordinates were 0.8 mm lateral from midline, 0.8 mm 
posterior from lambda, and 3.2mm ventral from the surface of the brain. 
Two weeks later, 0.3-0.5 ul RVdG was injected into the same area of the locus 
coeruleus using the procedure described above. After recovery, mice were 
housed in a biosafety level 2 (BSL2) facility for 4 days before euthanasia. 
Assessment of locus coeruleus terminal infectivity by CA V-Cre. Experiments in 
Extended Data Figs 4 and 7 were performed in Ail4 mice 6-8 weeks of age. Mice 
were anaesthetized and injected, as described above, with 0.25-0.5 tl CAV-Cre plus 
0.02 pil green retrobeads (Lumofluor, USA) into predicted LC output sites: olfactory 
bulb (OB), auditory cortex (AC), hippocampus (Hi), medulla (Me), or cerebellum 
(Cb). The coordinates used for CAV-Cre injection sites are listed as measurements 
from bregma for OB, AC and Hi, and from lambda for Me and Cb. Ventral 
measurements are from the surface of the brain. OB: 0.75 mm lateral, 4.0 mm 
anterior, 1 mm ventral; AC: 4.2 mm lateral, 2.5 mm posterior, 0.8 mm ventral; Hi: 
1.5 mm lateral, 2 mm posterior, 1.5 mm ventral; Me: 0.75 mm lateral, 3.3 mm pos- 
terior, 3.5 mm ventral; Cb: 2.0 mm lateral, 3.0 mm posterior, 1.5 mm ventral. After 
recovery, mice were housed in a BSL2 facility for 5-7 days before euthanasia. 
Locus coeruleus projection-based viral labelling. Experiments in Fig. 4 and 
Extended Data Fig. 10 were performed in wild-type or Dbh-Cre mice 8-12 weeks 
of age. Mice were anaesthetized and injected with ~0.25 tl AAV5-expressing TC 
(CAG-FLEx*”-TC for wild-type mice or CAG-FLEx*®"-TC for Dbh-Cre mice) into 
the left locus coeruleus as described above. Mice were also injected with ~0.5 ul 
CAV-Cre (wild-type mice) or CAV-FLEx?*? -Flp (Dbh-Cre mice) at locus coeruleus 
output sites in the ipsilateral hemisphere (OB, AC, Hi, Me; coordinates listed above). 
After recovery, mice were housed in a BSL2 facility for 3-4 weeks before euthanasia. 
Locus coeruleus TRIO and cTRIO. Experiments shown in Fig. 3 were performed 
in wild-type (TRIO) or Dbh-Cre (cTRIO) mice at 8-12 weeks of age. For TRIO, 
mice were anaesthetized and injected with ~0.5 ll ofa 1:1 mixture of AAV8 CAG- 
FLEx'°*?-G and AAV5 CAG-FLEx*?-TC into the left locus coeruleus, and also 
injected with ~0.5 pl CAV-Cre into ipsilateral OB, Hi, or AC using coordinates 
described above. For cTRIO, mice were anaesthetized and injected with ~0.5 ul of 
a 1:1 mixture of AAV8 CAG-FLEx""'-Gand AAV5 CAG-FLEx'*"-TC into the left 
locus coeruleus, and also injected with ~0.5 ll CAV-FLEx”*”-Flp into either ipsi- 
lateral Cb or Me using coordinates described above. After recovery, mice were 
housed in a BSL2 facility. Two weeks later, 0.3-0.5 jl RVdG was injected into the 
locus coeruleus using the procedure described above. After recovery, mice were 
housed in a BSL2 facility for 4 days before euthanasia. 

Motor cortex TRIO and cTRIO. Experiments in Fig. 1 were performed in wild- 
type (TRIO) or Rbp4-Cre (cTRIO) mice at 8-12 weeks of age. For TRIO, mice were 
anaesthetized and injected with ~0.5 pl ofa 1:1 mixture of AAV8 CAG-FLEx””-G 
and AAV5 CAG-FLEx*”-TC into the left motor cortex (MC), and also injected 
with ~0.5 ul CAV-Cre into contralateral MC (cMC) using the following coordi- 
nates from bregma: 1.5mm lateral, 1.5mm anterior, 0.8mm ventral from the 
surface of the brain. For cTRIO, Rbp4-Cre mice were anaesthetized and injected 
with ~0.5ul of a 1:1 mixture of AAV8 CAG-FLEx’®'-G and AAV5 CAG- 
FLEx'®"-TC into the left motor cortex, and also injected with ~0.5 pl CAV- 
FLEx*?. -Flp into the coordinates described above (for cMC), or the following 
coordinates for medulla (from lambda): 1 mm lateral, 3mm posterior, 4mm 
ventral from the surface of the brain. After recovery, mice were housed in a 
BSL2 facility. Two weeks later, 0.3-0.5 1] RVdG was injected into motor cortex. 
After recovery, mice were housed in a BSL2 facility for 4 days before euthanasia. 
Rat motor cortex TRIO. For TRIO experiments in rat (Extended Data Fig. 2), 
~0.4 il of a 1:1 mixture of AAV2 CAG-FLEx'*’-G and AAV2 CAG-FLEx”**-TC 
was injected into the brain of ~5-week-old Wister rat using stereotaxic equipment 
(Narishige, Japan). During surgery, animals were anaesthetized with 65 mg per kg 
ketamine and 13 mg per kg xylazine. For motor cortex injections, the needle was 
placed 2.5 mm anterior and 2.3 mm lateral from the bregma, and 0.9 mm ventral 
from the brain surface. ~0.5 pl CAV-Cre was injected subsequently into either the 
ipsilateral striatum (1.0 mm posterior and 4.0 mm lateral from the bregma, and 
4.0 mm ventral from the brain surface) or the contralateral motor cortex. After 
recovery, animals were housed in a BSL2 room. Two weeks later, 0.3 tl RVdG was 
injected into the AAV injection site under anaesthesia. After recovery, animals 
were housed in a BSL2 room for 4 days before euthanasia. 

Control TRIOs. Control experiments (Extended Data Figs 1, 2, 5 and 8) were 
performed using conditions described above for locus coeruleus and motor cortex 
experiments. 

Characterizing CAV-Cre spread in the piriform cortex. To test the extent 
of local spread of CAV-Cre in the injection site (Extended Data Fig. 3), ~0.3 ul 
CAV-Cre plus ~0.025 pl green retrobeads (Lumofluor, USA) were injected into 
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the anterior piriform cortex or surrounding areas of adult mice heterozygous for 
the Ail4 Cre reporter. During surgery, animals were anaesthetized with 65 mg 
per kg ketamine and 13 mg per kg xylazine. The stereotactic coordinates were 
anterior 1.7 mm from the bregma; lateral 1.7-2.8 mm from the midline; ventral 
2.5-4.0 mm from the surface of the brain. One week after the injection, animals 
were perfused and brain tissue was processed and sectioned as described in the 
histology and imaging section. Then, 60-|1m coronal sections through the olfact- 
ory bulb until the end of anterior piriform cortex (APC) were collected. The needle 
location was visualized by the presence of concentrated retrobeads. The distance 
between the needle tip and the nearest layer 1a of anterior piriform cortex or lateral 
olfactory tract was measured. In the main and accessory olfactory bulb, the num- 
ber of tdTomato* cells was counted either in every section (when total number of 
labelling was less than ~1,000 cells) or in every other section (when total the 
number of labelling was greater than ~1,000 cells). 

Histology and imaging. For all tracing analyses and CAV-Cre infectivity analyses 
(Figs 1-4 and Extended Data Figs 1-5, 6a, 7, 8 and 10), animals were perfused 
transcardially with phosphate-buffered saline (PBS) followed by 4% para- 
formaldehyde (PFA) in PBS. Brains were dissected, post-fixed in 4% PFA for 
24h, and placed in 30% sucrose in PBS for 24-48 h. After embedding in 
Optimum Cutting Temperature (OCT, Tissue Tek), samples were stored at 
—80°C until sectioning. For Figs 1-3 and Extended Data Figs 1-6a, and 8, con- 
secutive 60-11m coronal sections were collected onto Superfrost Plus slides, washed 
2 X 20 min with PBS, and stained with DAPI (1:10,000 of 5 mg ml}, Sigma- 
Aldrich), which was included in the last PBS wash. Slides were coverslipped with 
Fluorogel (Electron Microscopy Sciences). Samples were imaged using a Leica 
Ariol slide scanner with the SL200 slide loader. Briefly, the scanner first imaged 
slides using a 1.25% objective and the “TissueFind’ function to generate composite 
brightfield images of the entire slide. Then, the scanner automatically detected 
individual tissue sections from these brightfield images (~15-20 coronal tissue 
sections per slide), and performed automated tiled imaging of each tissue 
section on the slide in two channels (DAPI and Spectrum Green filters) using 
a 5X objective. Each tile was approximately 1.2mm by 1.2 mm, and included 
~20-11m overlap between tiles. Leica Ariol software automatically stitched 
together individual tiles during image collection to generate a composite SCN file 
of the entire slide. For analysis of locus coeruleus output (Fig. 4 and Extended Data 
Fig. 10), every 50-j1m sagittal section within the brain regions designated for 
analysis were collected sequentially into PBS. Sections were washed 2 X 10 min 
in PBS and blocked for 2-3 h at room temperature (RT) in 10% normal donkey 
serum (NDS) in PBS with 0.3% Triton-X100 (PBST). Primary antibodies (mouse 
anti-noradrenaline transporter (NET), PhosphoSolutions, 1447-NET, 1:10,000; 
rat anti-mCherry, M11217, Invitrogen, 1:2,000) were diluted in 5% NDS in 
PBST and incubated for four nights at 4°C. After 3 X 10 min washes in PBST, 
secondary antibodies were applied for 2-3 h at room temperature (donkey anti- 
mouse, Alexa-488, and donkey anti-rat Cy3, Jackson ImmunoResearch), followed 
by 3 X 10 min washes in PBST. Sections were additionally stained with DAPI. For 
immunostaining of locus coeruleus neuron cell bodies (Extended Data Figs 5 and 
7), 50-|4m coronal sections through the locus coeruleus were collected into PBS. 
Sections were washed, immunostained, and mounted as described above, using a 
primary antibody for tyrosine hydroxylase (rabbit anti-tyrosine hydroxylase (TH), 
Millipore, AB152, 1:2,000). All images were processed using NIH Image] software. 
For gephyrin immunostaining (Extended Data Fig. 6), fresh tissue was processed 
and 14-\1m horizontal sections were collected through the locus coeruleus follow- 
ing ‘Method B’ ina previously published protocol”. Sections were immunostained 
with primary antibodies for gephyrin (mouse anti-gephyrin, Synaptic Systems, 
147011, 1:700), and tyrosine hydroxylase (Millipore, 1:2,000). Representative 
images in Fig. 1f (top and middle) and Extended Data Figs 1, 4, 5c, d, 6b, ¢ (left), 
7a, Cc, 8a, b (bottom) and Extended Data Fig. 10a were obtained on a Zeiss epi- 
fluorescence microscope with a Nikon CCD camera. Representative images in 
Figs 2b (inset), 4c (inset), Extended Data Figs 5a, 6b (right), c (middle and right), 
and 7a (inset) were obtained on a Zeiss LSM 780 confocal microscope. 
Representative images in Figs 1f (bottom), 2b-h, and Extended Data Fig. 6a were 
obtained on a Leica Ariol slide scanner with the SL200 slide loader. Representative 
images in Extended Data Figs 2 and 3 were obtained by cooled CCD camera 
(ORCA-R2, Hamamatsu Photonics) connected with a upright fluorescent micro- 
scope (4X or 10X objective, BX53, Olympus). 

Data analysis for trans-synaptic tracing and TRIO. Because each brain differed 
in total numbers of input neurons, we normalized neuronal number in each region 
by the total number of input neurons counted in the same brain. For trans- 
synaptic tracing and TRIO/cTRIO analyses (Figs 1-3, and Supplementary 
Tables 1 and 2), GFP* input neurons were manually counted from every 
60-pm section through the entire brain, except near the starter cell location (motor 
cortex or locus coeruleus), as specified in Extended Data Figs 1 and 5. GFP* input 
neurons were assigned to specific brain regions based on classifications of the 


Allen Brain Atlas (http://mouse.brain-map.org/static/atlas), using anatomical 
landmarks in the sections visualized by DAPI counterstaining and autofluores- 
cence of the tissue itself. In a small minority of cases, assignment of input neurons 
to specific brain nuclei may be approximate if GFP* cell bodies were located on 
borders between regions, or when anatomical markers were lacking between 
directly adjacent regions (such as hypoglossal nucleus/nucleus prepositus or lat- 
eral reticular nucleus/gigantocellular reticular nucleus). However, quantitative 
analyses of input tracing results (Figs 1-3) were performed on anatomical classi- 
fications (specified by the Allen Brain Atlas) that were at least one hierarchical level 
broader than the discrete brain regions in which GFP* cells were originally 
assigned to. For instance, the fraction of GFP* cells designated to ‘paraventricular 
hypothalamic nucleus’ and ‘lateral hypothalamic nucleus’ were grouped into a 
broader category of ‘hypothalamus’. In almost all cases where individual GFP* 
cells were difficult to classify, their location was within brain regions that belonged 
to the same broad group. Also, some brain regions partially overlapped with the 
region excluded from analysis, such as motor and somatosensory cortex for the 
motor cortex tracing/cTRIO analyses (Fig. 1g, Supplementary Table 1), and the 
dorsal raphe, periaqueductal grey, and pontine reticular nucleus for the locus 
coeruleus tracing/TRIO/cTRIO analyses (Figs 2i, 3b, and Supplementary Table 
2). Therefore, the inputs reported for these regions are likely under-representa- 
tions of their contribution to motor cortex or locus coeruleus input. We did not 
adjust for the possibility of double-counting cells in any of our quantifications, 
which likely results in slight over-estimates, with the amount of over-estimation 
depending on the size of the cell in each region quantified. Nearly all starter cells 
were TH” for TRIO experiments with the CAV injection site being hippocampus 
(98.6% + 0.8%, n = 4 animals), olfactory bulb (96.0% + 2.7%, n = 4 animals), or 
auditory cortex (98.8% + 1.2%, n = 4 animals), consistent with our observation 
that cells at the locus coeruleus projecting to these target areas are predominantly 
noradrenaline neurons (Extended Data Fig. 7a). All starter cells were confirmed to 
be TH* in cTRIO experiments with the CAV injection site in Cb (100%, n = 4 
animals) or Me (100%, n = 4 animals). 

Data analysis for locus coeruleus output. For locus coeruleus output quantifica- 
tion (Fig. 4 and Extended Data Fig. 10), images were taken from 5 consecutive 
sections in each of the 8 brain regions (OB, AC, CC, SC, Hi, Hy, Cb, Me) ona Zeiss 
epifluorescence microscope with a 10X objective. We attempted to image from 
identical volumes within each brain region between samples, based on pre-deter- 
mined coordinates for these regions and the section number, as sections were 
collected sequentially and kept in order. The field of view for each image was 
located based solely on DAPI staining so the experimenter was blind to the level 
of TC™ locus coeruleus axons before imaging. An image was also taken of the 
noradrenaline transporter (NET) immunostaining in the same field of view to 
confirm that all TC* axons were also NET“. For analysis, the TC channels for each 
image were made binary in Image], after performing background subtraction and 
thresholding to a value ~4X greater than mean background intensity. The pixel 
densities of these binary images were measured and averaged (5 images per brain 
region) to determine the fraction of TC* axons resulting from LC-NE neurons in 
each brain region, as a fraction of the total TC axons quantified from all of the 
imaged regions in that brain. 

Analysis of spatial distribution in the locus coeruleus. To assess the distribution 
of discrete LC-NE neurons within the locus coeruleus dependent on their output 
(Extended Data Fig. 7), 50-jum coronal sections were collected in order through the 
locus coeruleus of experimental mice and processed as described above in the 
histology and imaging section. The experimenter was blind to the location of the 
output injection site when imaging and quantifying locus coeruleus sections. The 
outlines of the digital model were manually drawn using TH immunostaining of LC- 
NE cell bodies as a guide. A cross (+) marks the approximate centre of each locus 
coeruleus section following the procedure below. (1) measure the maximal height 
(Hmax) of the locus coeruleus (based on the shape of TH immunostaining); (2) 
measure the maximum width (Winax) of the locus coeruleus at the 0.5 Hmax height 
level; (3) place a cross at the 0.5 Wimax position. This cross was used to align each 
locus coeruleus image with its corresponding digital locus coeruleus section, before 
designating the location of the tdTomato  LC-NE cell bodies with coloured dots on 
the digital section. Before quantifying experimental sections, two independent sets of 
TH-immunostained locus coeruleus sections were found to fit within the boundaries 
of the digital model using this method of alignment. The distribution of tdTomato* 
LC-NE dots from each digital section were counted and assigned to dorsal/ventral 
and medial/lateral sub-regions using horizontal and vertical lines drawn through the 
centre cross of each digital section, respectively (Extended Data Fig. 7). 

Input simulation. We simulated the number of input areas for the two sparsest 
Dbh-Cre tracing samples with Matlab. For each starter cell, when assuming it 
receives inputs from n areas, we randomly sampled n areas from the 111 input 
areas without replacement, weighted by the total counts of cells in each area 
derived from all the Dbh-Cre brains. The simulated input areas to the 4 (sparsest 
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sample) or 22 (second sparsest sample) starter cells were then consolidated to 
generate the final number of input areas. Ten thousand rounds of simulations were 
performed for each n between 11 and 30 (sparsest sample) or 3 and 22 (second 
sparsest sample). 

Statistical methods. No statistical methods were used to predetermine sample 
size. Animals were excluded from certain experiments using the following pre- 
established criteria. For all trans-synaptic tracing, TRIO, and cTRIO experiments 
in motor cortex and locus coeruleus, samples were excluded if less than 50 GEP* 
neurons were observed in the brain outside of the area designated as local back- 
ground. For projection-based viral-genetic labelling, samples were excluded if less 
than 10 LC-NE neurons were observed to be TC”. No method of randomization 
was used in any of the experiments. For ANOVA analyses, the variances were 
similar as determined by Brown-Forsythe test. 
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Extended Data Figure 1 | Controls for TRIO and cTRIO at the motor 
cortex. a—c, Negative control experiments omitting CA V-Cre for TRIO (a), and 
omitting CA V-FLEx*”-Flp (b) or the Rpb4-Cre transgene (c) for cTRIO 
showed only local non-specific infection of RVG. This background labelling is 
likely due to Cre- or Flp-independent leaky expression of a small amount of 
TVA-mCherry (TC), too low for mCherry to be detected but still capable of 
permitting infection by EnvA-pseudotyped RVdG due to the high sensitivity of 
TVA”. d, Quantification for three controls (n = 4, 4, 7 animals, respectively). 
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By comparison, 672 GFP* neurons were counted in the same region for an 
experimental brain that has the lowest starter cells among the 11 brains whose 
data were used for quantitative analysis of motor cortex TRIO input tracing. 
These background cells were restricted within ~500 jm of the injection site. 
Because of these observations, GFP™ cells on sections within ~600 um of 
the injection site were excluded from the input analysis in Fig. 1g. Scale bar, 
100 jm. Error bars, s.e.m. 
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Extended Data Figure 2 | TRIO applied to rat primary motor cortex. 

a, Schematic of injection sites used for TRIO in rat motor cortex (see Fig. 1c for 
details of the viruses). Two different C regions were tested: striatum or 
contralateral motor cortex (cMC). b, c, Coronal section of rat motor cortex 
stained with DAPI (blue). Starter pyramidal neurons projecting to contralateral 
motor cortex (b) or striatum (c) (yellow, a subset indicated by arrowheads) 
can be distinguished from neurons receiving CAV-Cre and AAV-FLEx”-TC 
(red) or GFP from RVdG (green). Bottom insets, coronal sections showing 
representative presynaptic GFP™ cells in somatosensory cortex (SC) or 


thalamus (Th). These data indicate that callosal-projecting neurons and 
striatum-projecting neurons in rat motor cortex both receive direct synaptic 
input from somatosensory cortex and thalamus (” = 2 animals for CMC C 
region; n = 3 animals for striatum C region). d, e, Omitting CA V-Cre for TRIO 
in the rat also resulted in local non-specific infection of RVdG. On average 
~200 cells were observed (n = 4 animals) within 800 jim from the injection 
site in these control experiments. By comparison, 1,392 GFP* neurons were 
counted in the same region of a TRIO sample that has the lowest starter cells 
among the 5 brains analysed. Scale bars, 100 jim. Error bars, s.e.m. 
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Extended Data Figure 3 | Evaluation of CAV-Cre spread by using the 
OB-—APC projection. a, Four representative 60-11m coronal sections of the 
CAV-Cre injection site in the anterior piriform cortex (APC) of four Ail4 Cre- 
reporter mice. Red, tdTomato; green, retrobeads; blue, DAPI. The location 
of the injection site was readily visualized by concentrated retrobeads. D, dorsal; 
L, lateral; M, medial; V, ventral. In each mouse, we determined the minimal 
distance (D) between the injection site and layer 1a of the piriform cortex, 
where mitral cell axons terminate, or lateral olfactory tract, where mitral cell 
axon bundles are present. Dashed lines represent the boundary between 
layer 1a and layer 1b. For each sample, we counted the number of tdTomato- 
labelled mitral cells (numbers below each image) from serial olfactory bulb 
(OB) sections. b, An example 60-11m coronal section of the OB. Both tdTomato 
and retrobeads signals were found to be mostly restricted to the mitral cell 


layer (M) of the main olfactory bulb (MOB) and accessory olfactory bulb 
(AOB) with minor labelling in the granule cell layer (Gra). As AOB mitral cells 
do not form synapses in the APC, this observation indicates that CAV-Cre 
can infect axons-in-passage. c, Distribution of D among 26 injections (x axis) 
and relationship between D and the numbers of labelled cells in the MOB 

(y axis). d, Histogram based on c. Dense labelling (over 1,000) was obtained 
only when D < 100 jm. CAV-Cre injections with D > 800 um rarely labelled 
the OB (2.8 + 1.9 cells per bulb, n = 4 animals). e, Cumulative distribution plot 
of MOB cell counts. A sample of the ninth smallest D (D = 200 tm) reached 
90% of the labelling (indicated by vertical dotted line) detected in all 26 samples, 
suggesting that given our sample distribution, ~90% of axonal transduction 
occurred within 200 jm from the CAV-Cre injection site. Scale bars, 100 tm. 
Error bars, s.e.m. 
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Extended Data Figure 4 sai Evaluation of retrograde infection by CAV-Cre. 
a, Representative coronal sections of the injection sites where CAV-Cre plus 
retrobeads were delivered into the olfactory bulb, dorsal hippocampus, auditory 
cortex, cerebellum or medulla of the Ail4 Cre-reporter mice (see Methods 
for coordinates). Red, tdTomato; green, retrobeads; blue, DAPI. tdTomato 
labelling was densest at the injection site, and corresponded with the presence 
of retrobeads. We did not observe dense tdTomato or retrobeads labelling in 
other brain regions adjacent to the injection site unless these sites sent direct 
projections to the injection site, indicating that for our experiments, CA V-Cre 
was efficiently and specifically delivered to the targeted brain regions. n = 4 
animals per injection site. b, Representative coronal sections of brain regions 
that contained tdTomato™ labelling of specific cell populations known to 
project to CAV-Cre injection sites. The following is a partial list: neurons 


projecting to olfactory bulb (first column): ipsi- and contralateral anterior 
olfactory nucleus (AON), piriform cortex (Pir), nucleus of the lateral olfactory 
tract (nLOT), but not contralateral olfactory bulb; to dorsal hippocampus 
(second column): lateral and medial septum (LS, MS) and entorhinal cortex 
(Ent); to auditory cortex (third column): somatosensory cortex (SC), entorhinal 
cortex (Ent), and medial geniculate nucleus (MGN); to cerebellum (fourth 
column): contralateral pontine nuclei (PN) and inferior olive (IF); to medulla 
(fifth column): insular cortex (Ins), central amygdala (CeA), and 
paraventricular hypothalamic nucleus (PVH). Coronal images are composites 
generated from overlapping tiled images. Insets show high magnification 
images of boxed regions. Bottom, sagittal schematic of the CAV-Cre injection 
sites (a) and the approximate location of the two representative coronal sections 
above. Scale bars, 1 mm; inset, 100 tum. 
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Extended Data Figure 5 | Controls for Dbh-Cre-based trans-synaptic 
tracing and TRIO analysis in locus coeruleus. a, A Representative coronal 
section of the locus coeruleus from a mouse heterozygous for Dbh-Cre and Ail4 
Cre-reporter transgenes. Sections were labelled with an antibody against 
tyrosine hydroxylase (TH), an enzyme in the biosynthetic pathway for 
noradrenaline (green), while cells expressing Cre recombinase are visible by 
expression of tdTomato (red). b, Quantification of the number of tdTomato* 
neurons in the locus coeruleus that were also labelled by TH antibody (n = 3 
animals). Every 50-j1m section through the locus coeruleus was collected for 
quantification. Qualitatively, all TH* cells expressed tdTomato; however, we 
cannot determine quantitatively because we could not accurately count TH* 
cells due to dense process staining. c, Top, schematic for negative control where 
AAVs that express Cre-dependent TVA-mCherry fusion (TC) and rabies 
glycoprotein (G) were injected into the locus coeruleus of wild-type mice, 
followed by injection of RVdG. Middle, coronal section of the locus coeruleus 
stained with DAPI (blue) shows a small number of GEP™ neurons at the 
injection site. The dotted rectangle highlights GFP* neurons magnified in the 
bottom panel. d, Top, in this negative control, Dbh-Cre mice received Cre- 
dependent TVA-mCherry fusion (without rabies glycoprotein) via AAV 
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injection into the locus coeruleus, followed by RVdG. Middle, a coronal 
section of the locus coeruleus stained with DAPI (blue) shows infection of Cre* 
locus coeruleus neurons with TC (red) or TC and RVdG (yellow) at the 
injection site. The dotted rectangle highlights infected locus coeruleus neurons 
magnified in the bottom panel. Most green cells are also red. No GFP* cells 
were observed outside the region immediately adjacent to the injection site, 
indicating that trans-synaptic tracing depends on rabies glycoprotein. 

e, Quantification of the number of GFP™ cells (c), or GEP™ cells that did not 
colocalize with TC (d), that were observed in the experiments described in 
(c, n = 8 animals) and (d, m = 6 animals). By comparison, 1,381 GFP* neurons 
were counted in the same region for an experimental brain that has the median 
number of starter cells among the 9 brains. For explanation of background 
labelling, see Extended Data Fig. la-c. In either case, no GEP* neurons were 
visible >800 jum away from the injection site. f, Schematic of brain regions 
quantified for presynaptic GFP~ neurons. Regions approximately 800 1m 
anterior and posterior to the centre of the locus coeruleus were excluded from 
analysis due to local background labelling from TVA-mCherry fusion and 
GEP. Scale bars, 50 tm (a), 1 mm (c, d, middle panels), 100 um (c, d, bottom 
panels). Error bars, s.e.m. 
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Extended Data Figure 6 | Purkinje cell axons contact noradrenaline 
processes in the locus coeruleus. a, Coronal sections counterstained with 
DAPI (blue) showing representative GEP* Purkinje cells (green) from Dbh-Cre 
trans-synaptic tracing experiments described in Fig. 2. Labelled Purkinje 

cells span the anterior—posterior axis, but are enriched in the medial portion of 
the ipsilateral cerebellum. b, Sagittal section through the locus coeruleus of 
mice heterozygous for the transgenes Pcp2-Cre and Ai14, in which tdTomato 
(tdT) expression was restricted to cerebellar Purkinje cells and their processes 
(red). Sections were labelled with DAPI (blue) and anti-TH antibody (green) 
to label LC-NE neurons. The right panel is a maximum-projection confocal 
stack taken with a 40 objective of the boxed region in the left panel. Purkinje 
cell axons are intermingled with TH* locus coeruleus neurons and their 
processes. c, Left, Representative image of a horizontal section collected 
through the locus coeruleus of a mouse heterozygous for the transgenes 
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sagittal 


Pcp2-Cre and Ail4, Sections were stained with anti-TH antibody (green) to 
label LC-NE neurons and their processes, and anti-gephyrin (geph) antibody 
(white) to label inhibitory post-synaptic densities. Middle, maximum- 
projection confocal stack taken with a 40X objective of the dashed box of the 
left panel showing the overlap between tdTomato™ Purkinje cell axons and 
TH" LC processes. Right, high magnification of the dashed box of the middle 
panel, showing that several of these contact points also contained gephyrin* 
puncta (arrowheads) within green processes apposing the red processes, 
consistent with GABAergic Purkinje cell axons forming synapses onto 
dendrites of TH* LC-NE neurons. Images in a were derived from larger 
composite images generated by a Leica Ariol Slide Scanner. A, anterior; D, 
dorsal; L, lateral; M, medial; P, posterior; V, ventral. Scale bars, 1 mm (a; b and 
c, left), 100 um (a, inset; b, right), 10 um (c, middle and right). 
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Extended Data Figure 7 | Spatial distribution of LC-NE neurons projecting 
to distinct output brain regions. a, Representative images of individual LC- 
NE neurons labelled within the locus coeruleus by injection of CAV-Cre at 
specific output sites (see Extended Data Fig. 4a) in Ail4 Cre-reporter mice. 
Coronal sections through the locus coeruleus were collected in order and 
stained with anti-TH antibody (pseudocoloured in green). All tdTomato 
(tdT)* neurons within the locus coeruleus were also TH, and many of these 
cells also contained retrobeads (green in inset). Injection of CAV-Cre into the 
olfactory bulb, hippocampus or auditory cortex resulted in high tdTomato 
expression in NE‘ neurons within the locus coeruleus, whereas tdTomato 
labelling was almost completely absent in adjacent brain regions, indicating 
that regions next to the locus coeruleus contribute minimal projections to these 
output sites. However, CAV-Cre injected into the cerebellum or medulla 
labelled NE* locus coeruleus neurons as well as adjacent, NE cell populations 
(a subset of which are highlighted by arrowheads). b, The locations of 
tdTomato* LC-NE neurons from sequential 50-j1m coronal sections collected 
through the entire locus coeruleus were transferred to corresponding sections 
of a digital locus coeruleus model and are represented by coloured dots (see 
Methods). c, Schematic of the dorsal/ventral and medial/lateral classifications 
used with tdTomato’ LC-NE neurons occurring from CA V-Cre injections into 
the olfactory bulb (left) or cerebellum (right) of Ail4 mice. These classifications 
were made by drawing horizontal and vertical lines through the cross 

(b) designating the middle of each locus coeruleus section. d, Quantification of 
the fraction of td Tomato’ LC-NE cells in each locus coeruleus section along the 
anterior—posterior axis of the locus coeruleus. No significant differences were 
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observed for the anterior-posterior distribution of tdTomato* LC-NE neurons 
projecting to different output sites. e, Quantification of the medial-lateral 
distribution of LC-NE neurons projecting to different output sites. LC-NE 
neurons showed no bias in the medial versus lateral portion of the locus 
coeruleus, regardless of where they sent projections. f, Quantification of the 
dorsal-ventral distribution of tdTomato’ LC-NE neurons projecting to 
different output sites. Although no bias was observed in the posterior locus 
coeruleus, significant differences were observed in the anterior and mid-LC. 
Specifically, LC-NE neurons projecting to the forebrain showed a dorsal bias for 
tdTomato “ cell labelling within the anterior locus coeruleus, whereas LC-NE 
neurons projecting to the cerebellum and medulla were located in more ventral 
portions of the anterior- and mid-LC. n = 4 animals per CAV injection site. 
Data in d was analysed with one-way ANOVA. Data in e, f were analysed by 
first performing two-way ANOVA, which did not uncover any significance in 
the medial/lateral bias of tdTomato’ LC-NE neurons. Two-way ANOVA 
determined that (1) the location of the CAV injection site contributes to the 
dorsal/ventral bias of tdTomato* LC-NE neurons within the locus coeruleus 
(P < 0.0001), (2) there is interaction between the CAV injection site and the 
location (anterior, mid, posterior) of tdTomato* NE neurons within the locus 
coeruleus (P = 0.0389), and (3) the locus coeruleus subdivisions themselves did 
not significantly contribute to the variance observed in tdTomato* LC-NE 
neurons. One-way ANOVA and post hoc Tukey’s multiple comparison were 
then performed to test the significance of dorsal/ventral bias in each locus 
coeruleus region based on CAV injection sites. Scale bars, 50 jum. Error bars 
represent s.e.m. *P < 0.05; **P < 0.01, ***P < 0.001. 
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Extended Data Figure 8 | Controls for locus coeruleus cTRIO. a, Top, 
schematic for negative controls where AAVs expressing Flp-dependent TVA- 
mCherry fusion and rabies glycoprotein were injected into the locus coeruleus 
of Dbh-Cre mice, followed by RVG injection into the locus coeruleus, but 
the CAV-FLEx’**”-Flp injection was omitted. Middle, coronal section of the 
locus coeruleus stained with DAPI (blue) shows a small number of GEP* 
neurons at the injection site. The dotted rectangle highlights GFP* neurons 
magnified in the lower panel. b, Top, schematic for negative control where 
CAV-FLEx'**-Fip was injected into the olfactory bulb and AAVs expressing 
Flp-dependent TVA-mCherry fusion and rabies glycoprotein were injected 
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into the locus coeruleus of wild-type mice, followed by RVdG injection; hence 
there was no Cre to mediate Flp expression in locus coeruleus cells. Middle, 
coronal section of the locus coeruleus stained with DAPI (blue) shows a 
small number of GFP™ neurons at the injection site. The dotted rectangle 
highlights GFP* neurons magnified in the lower panel. c, Quantification of 
GFP* background labelling in the locus coeruleus (n = 4 and 8 animals). This 
labelling is likely caused by leaky TVA expression as discussed in Extended 
Data Fig. 1. In none of these control experiments did we observe GFP" or TC* 
neurons >800 jim away from the injection site. Scale bars, 1 mm (middle 
panels), 100 tum (lower panels). Error bars represent s.e.m. 
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Extended Data Figure 9 | Simulation of input convergence in Dbh-Cre 
tracing experiments. In the sparsest Dbh-Cre trans-synaptic tracing brain, 4 
starter cells received input from 43 distinct input regions (309 input neurons, 
see Supplementary Table 2, sample number 8). In the second sparsest sample, 
22 starter cells received input from 66 distinct input regions (756 input neurons; 
see Supplementary Table 2, sample number 9). a, The relation between the 
number of input regions for each LC-NE starter cell and the probability of 
observing >42 (left) or >65 (right) input regions in simulation, assuming that 
each starter cell receives input from a given region with the same probability. As 
the number of input regions per starter cell increases, the probability of 
observing inputs from >42 or >65 regions also increases. Based on a threshold 
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of P value <0.001, these simulations suggest that, to account for the total 
number of observed input areas in each brain sample, there must be individual 
LC-NE neurons that receive input from more than 15 regions for the sparsest 
sample (red dot, left) or more than 9 regions for second sparsest sample 

(red dot, right). b, Detailed view of the distribution of simulation results 
corresponding to the red dots in a. Assuming that each cell receives input from 
15 (left) or 9 (right) distinct regions, only 5 (left) or 6 (right) out of 10,000 
simulations label >42 (left) or >65 (right) input regions. Note that if the 
assumption that each starter cell receives input from the same number of 
regions does not apply, then there must be at least one cell receiving input 
from more regions than the number specified in the simulation. 
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Extended Data Figure 10 | Representative images and distribution of 
individual samples for projection-based viral-genetic labelling experiments. 
a, Representative images from sagittal sections of TC LC-NE axons in 8 brain 
regions indicated at the top of each column (the last column shows cell bodies 
for LC-NE neurons) resulting from CAV injections at four projection sites 
indicated on the left (top four rows), or AA V-FLEx’*?-TC injection at the locus 
coeruleus of Dbh-Cre animals (bottom row). All TC* processes were confirmed 
to contain noradrenaline transporter (NET, an NE neuron marker) by anti- 
NET immunostaining (not shown; see Fig. 4 inset). b, The normalized fraction 
of TC* LC-NE axons for individual experiments for five conditions are colour 
coded on the top right. Filled symbols represent experiments where Dbh-Cre 


mice were used along with CAV-FLEx'*"”-Flp; open symbols represent 
experiments where wild-type mice were used along with CAV-Cre. The 
distribution of individual samples with regards to the fraction of TC* axons 
observed at output sites was similar between wild-type and Dbh-Cre mice. 
Collectively, the samples for each condition were averaged to quantify the 
normalized fraction of TC’ LC-NE axons in each brain region as reported in 
Fig. 4d. Scale bar, 50 um. Error bars represent s.e.m. Abbreviations: AC, 
auditory cortex; CC, cingulate cortex; Cb, cerebellum; Hi, hippocampus; Hy, 
hypothalamus; LC, locus coeruleus; Me, medulla; OB, olfactory bulb; SC, 
somatosensory cortex. 
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A novel Ebola virus (EBOV) first identified in March 2014 has 
infected more than 25,000 people in West Africa, resulting in more 
than 10,000 deaths'”. Preliminary analyses of genome sequences of 
81 EBOV collected from March to June 2014 from Guinea and 
Sierra Leone suggest that the 2014 EBOV originated from an inde- 
pendent transmission event from its natural reservoir’ followed by 
sustained human-to-human infections‘. It has been reported that 
the EBOV genome variation might have an effect on the efficacy of 
sequence-based virus detection and candidate therapeutics”®. 
However, only limited viral information has been available since 
July 2014, when the outbreak entered a rapid growth phase’. Here 
we describe 175 full-length EBOV genome sequences from five 
severely stricken districts in Sierra Leone from 28 September to 
11 November 2014. We found that the 2014 EBOV has become 
more phylogenetically and genetically diverse from July to 
November 2014, characterized by the emergence of multiple novel 
lineages. The substitution rate for the 2014 EBOV was estimated to 
be 1.23 x 107° substitutions per site per year (95% highest poster- 
ior density interval, 1.04 x 10~* to 1.41 X 107° substitutions per 
site per year), approximating to that observed between previous 
EBOV outbreaks. The sharp increase in genetic diversity of the 
2014 EBOV warrants extensive EBOV surveillance in Sierra 
Leone, Guinea and Liberia to better understand the viral evolution 
and transmission dynamics of the ongoing outbreak. These data 
will facilitate the international efforts to develop vaccines and ther- 
apeutics. 

A large-scale Ebola viral disease (EVD) outbreak has been ongoing 
in Western Africa for nearly a year, with more than 23,000 reported 
cases’. Previous findings have shown that the causative agent is a novel 
Ebola virus (EBOV)*. Among the three West African countries with 
widespread and intense EBOV transmission, Sierra Leone reported the 
largest number of confirmed cases, approximately 58% of the total 
confirmed EBOV infection cases. To help Sierra Leone fight against 
EVD, the Chinese government dispatched the China Mobile 
Laboratory Testing Team (CMLTT) in September upon request of 
the Sierra Leone government. The CMLTT, equipped with medical 
experts who specialize in laboratory testing, epidemiology, and run- 
ning a holding and treatment centre, has kept working at the Sierra 
Leone-China Friendship Hospital at Jui Town (represented as a red 


star in Fig. la) of Western Area, approximately 30 km southeast of 
Freetown, the capital city of Sierra Leone. All the activities of the 
CMLTT were coordinated by the Emergency Operations Center 
jointly established by the Ministry of Health and Sanitation of Sierra 
Leone and the World Health Organization (WHO). 

To fight against this novel EBOV, Gire and colleagues systematically 
analysed 81 EBOV genomes from Guinea (n = 3)? and Sierra Leone 
(n = 78)* collected from the early stage of the 2014 EBOV outbreak, 
revealing the origin, transmission, and rapid accumulation of genetic 
variation of the 2014 EBOV. However, only a few additional full-length 
EBOV genome sequences were published since July 2014, when the 
outbreak entered a rapid growth phase driven by sustained human-to- 
human transmission*. From 28 September to 11 November 2014, a 
total of 823 samples were tested to be EBOV-positive using reverse 
transcription-PCR (RT-PCR) by the CMLTT, among which 175 full- 
length genomes were successfully sequenced with each from an indi- 
vidual EVD patient (Fig. la and Supplementary Table 1). These 175 
samples were obtained from five severely stricken districts in Sierra 
Leone, including 47 from Western Urban, 67 from Western Rural, 47 
from Port Loko, 5 from Kambia, and 9 from Bombali (Fig. 1a). In 
detail, approximately one fifth of the EBOV-positive samples for each 
region were sequenced, 19.5% for Western Urban, 21.2% for Western 
Rural, 22.1% for Port Loko, and 16.1% for Kambia. Regarding 
Bombali, 9 out of 17 (52.9%) strains were sequenced. Therefore, our 
sequenced genomes were roughly proportional to the prevalence in 
different regions. 

Phylogenetic analysis of all available full-length EBOV genome 
sequences from Sierra Leone (n= 253) and Guinea (n = 3) from 
2014 was performed using MrBayes® in which the three Guinean 
strains were designated as root*’. Our phylogenetic analysis showed 
that the 2014 EBOV increased in diversity at least through October 
after its initial introduction into Sierra Leone (Fig. 1b and Extended 
Data Fig. 1). Apart from the previously described lineages SL1 and 
SL2*, the SL3 lineage has evolved into two major lineages, SL3.1 and 
SL3.2 in June in eastern Sierra Leone, both of which were then trans- 
mitted to western Sierra Leone. The majority of the EBOV collected 
from late September to mid-November fell into lineage SL3.2, with a 
few belonging to lineage SL3.1. However, none of them belonged to 
lineages SL1 and SL2. In particular, the EBOV sequenced by us could 
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Figure 1 | Geographical distribution and phylogenetic analysis of the 2014 
EBOV from Sierra Leone. a, Geographical distribution of the 823 EBOV 

positive samples and the 175 newly sequenced genomes (represented as blue 
dots). In the panel, main roads and waterways are showed as yellow lines and 


be classified into seven novel independent sublineages based on the 
phylogenetic topology, two sublineages belonging to SL3.1 (SL3.1.1 
and SL3.1.2) and five belonging to SL3.2 (SL3.2.1 to SL3.2.5) 
(Fig. 1b). Phylogenetic tree constructed using the maximum likelihood 
method showed a similar topology (Extended Data Fig. 2). Therefore, 
the 2014 EBOV has become highly diverse in its first year along with its 
spread in Sierra Leone. 

To explore the spatiotemporal relationships of the EBOV in western 
Sierra Leone, we performed a phylogeographic analysis using BEAST’® 
(Fig. 2 and Extended Data Fig. 3). In this analysis, only 22 out of the 78 
sequences previously published by Gire et al. (ref. 4) were included in 
our analysis to reduce the computation load. To this end, we selected 
representative sequences from the previously described lineages GIN, 
SLI, SL2 and SL3, ensuring that there is at least one sequence for every 
sampling date. From a time point of view, all of the novel sublineages 
probably emerged before August (Fig. 2). In addition, multiple lineages 
were co-circulating in a single town/district. All of the seven subli- 
neages were identified in Waterloo, indicating the highest phylogenetic 
diversity in this region. Viruses from Freetown belonged to six of the 
seven sublineages, with sublineage 3.2.3 undetected. Five novel sub- 
lineages have also been found in Maforki Chiefdom of Port Loko. 

The spatiotemporal linkage of our sequenced EBOV genomes is 
further shown in Fig. 3a. First, viruses from Freetown and Waterloo, 
the capital and the traffic hub, are estimated to be spatiotemporally 
related, as also observed in sublineages 3.1.1, 3.1.2 and 3.2.4 (Fig. 2), 
indicating that frequent transmission events might have occurred 
between the two regions. Second, this network reveals that viral 
transmission events have also occurred between the three major 
sites (Freetown, Waterloo and Maforki Chiefdom) and their sur- 
rounding regions. Third, our results also suggest spatiotemporal con- 
nections of EBOV between Waterloo and Port Loko, Kambia, and 
Bombali, respectively, as exemplified in sublineages 3.2.1 and 3.2.5 
(Fig. 2). Based on the higher transmission rates of Waterloo, 
Freetown and Maforki Chiefdom, intensive EBOV surveillance in 
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black dash lines, respectively. b, A Bayesian phylogenetic tree of the 2014 
EBOV. The 175 newly sequenced viruses in this study are shown in colours, 
and others are shown in grey. The seven novel lineages designated in the 
present are highlighted. Posterior support for major nodes is shown. 
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Figure 2 | Phylogeographic reconstruction of the 2014 EBOV using BEAST. 
In the left panel, the novel 175 EBOV genome sequences were coloured by 
geographic regions. The transition of different colours represents a potential 
transmission event. In the right panel, the number of sequences from different 
geographic regions in each lineage is summarized. 
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Figure 3 | Reconstructed phylogeographic linkage, substitution rate, and 
effective population size of the 2014 EBOV in western Sierra Leone from 
September to November 2014. a, The phylogeographic linkage constructed 
using BEAST. Thickness of lines represents the relative transmission rate 
between two regions. The size of each node is proportional to the sum of the 
relative rates of the region with Bayes factor >3. b, Substitution rates of the 


the three regions should be helpful for the prevention and control of 
the EVD outbreak in Western Sierra Leone. 

The substitution rate for all of the 2014 EBOV was estimated using 
BEAST to be 1.23% 10° substitutions per site per year (95% 
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highest posterior density interval, 1.04 X 10 * to 1.41 X 10° substi- 
tutions per site per year) (Fig. 3b). Our estimate was similar to those 
between previous EBOV outbreaks, approximately 1.00 X 107? sub- 
stitutions per site per year*'’"*. This suggests that, over a longer 
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time interval, EBOV is still undergoing evolution at a relatively con- 
stant rate. 

The estimated population size of the 2014 EBOV from Sierra Leone 
steadily increased from July to early October, and then entered 
a plateau period (Fig. 3c). This therefore implies that the effective 
population size of the 2014 EBOV became stable in October, which 
was also broadly consistent with the weekly change of numbers of 
confirmed EBOV infection cases and EVD patients in Sierra Leone 
(Fig. 3c)’. The doubling time estimated using BEAST was 22.1 days 
(95% confidence interval, 18.9-25.59 days), which was comparable to 
that calculated using the epidemiological data from Sierra Leone, with 
the mean value of 18.9 days. 

We then investigated the molecular characterization of the novel 
EBOV genome. Raw reads of each genome were mapped to the 
reference genome (KJ660346.2). The average normalized coverage 
was approximately 1,400-fold (Fig. 4a). 341 single nucleotide 
polymorphisms (SNPs) have been previously identified between the 
2014 outbreak EBOV and previous EBOV’*, and 440 SNPs were iden- 
tified in our sequenced genomes. The substitutions in the 175 newly 
sequenced EBOV genomes were summarized among different lineages 
(Fig. 4b and Supplementary Table 2). Approximately a quarter of the 
identified substitutions were non-synonymous, and half of them were 
synonymous (Extended Data Fig. 5). Some of the SNPs were lineage- 
specific and could be used as markers to distinguish different lineages 
(Fig. 4b and Supplementary Table 2). For example, substitutions 
A7148G and A17445G were only found in sublineage 3.1.2, whereas 
sublineage 3.2.4 possessed a specific T5849C substitution. The T > C 
substitutions that occurred in the 3’ UTR region of NP gene (at gen- 
ome positions 3008 and 3011) were specific to sublineage 3.2.5. In 
particular, the T > C substitution at position 14019 occurred in all 
sequences of lineage 3.2, which was first described in this study. 
Moreover, seven previously reported substitutions (at positions 800, 
1849, 6283, 8928, 10218, 15963, 17142)* were always present in the 
novel lineages from June to November 2014 and became the dominant 
allele in the population, suggesting that they have been fixed. These 
substitutions included two non-synonymous substitutions (C800T in 
the NP gene and C6283T in the GP gene), four synonymous substitu- 
tions, and one in the non-coding regions. 

Interestingly, we observed several serial T > C substitutions in six 
newly sequenced EBOV genomes, which occurred within a genome 
region of 150 base pairs in length (Fig. 4c and Extended Data Fig. 4). 
The serial T>C substitutions were further confirmed by Sanger 
sequencing after PCR amplification (Extended Data Table 1). Such 
serial substitutions were found in four different regions of six strains 
belonging to three different lineages, two of which were in coding 
regions and the other two were in non-coding regions. However, the 
emergence mechanism of such serial T >C substitutions and their 
potential biological functions warrant further investigation. 

In summary, our findings highlighted the increasing genetic divers- 
ity and transmission dynamics of the 2014 EBOV, with an evolution- 
ary rate estimated to be similar to that between previous EBOV 
outbreaks. This information provided an insight into the viral evolu- 
tion and transmission dynamics, which would facilitate the prevention 
and control of EBOV in Sierra Leone and would also guide research on 
vaccines and therapeutic targets. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 30 January; accepted 23 April 2015. 
Published online 13 May 2015. 


1. World Health Organization. Ebola response roadmap - Situation report. http:// 
www.who.int/csr/disease/ebola/situation-reports/en (accessed 1 April 2015). 


96 | NATURE | VOL 524 | 6 AUGUST 2015 


2. Baize, S et al. Emergence of Zaire Ebola virus disease in Guinea. N. Engl. J. Med. 
371, 1418-1425 (2014). 

3. Leroy, E. M. et al. Fruit bats as reservoirs of Ebola virus. Nature 438, 575-576 
(2005). 

4. Gire, S. K. et al. Genomic surveillance elucidates Ebola virus origin and 
transmission during the 2014 outbreak. Science 345, 1369-1372 (2014). 

5. Kugelman,J.R. eta/. Evaluation of the potential impact of Ebola virus genomic drift 
on the efficacy of sequence-based candidate therapeutics. MBio 6, e€02227-14 
(2015). 

6. Feldmann, H. et a/. Ebola virus: from discovery to vaccine. Nature Rev. |mmunol 3, 
677-685 (2003). 

7. WHO Ebola Response Team. Ebola virus disease in West Africa-the first 9 
months of the epidemic and forward projections. N. Engl. J. Med. 371, 1481-1495 
(2014). 

8. Huelsenbeck, J. P. & Ronquist, F. MRBAYES: Bayesian inference of phylogenetic 
trees. Bioinformatics 17, 754-755 (2001). 

9. Dudas G., Rambaut A. Phylogenetic analysis of Guinea 2014 EBOV Ebolavirus 
outbreak. PLoS Curr. http://dx.doi.org/10.1371/currents.outbreaks.84eefe5ce43 
ec9dcObf0670f7b8b417d (2014). 

10. Drummond, A. J. & Rambaut. A. BEAST: Bayesian evolutionary analysis by 
sampling trees. BMC Evol. Biol. 7, 214 (2007). 

11. Jenkins, G. M. et al. Rates of molecular evolution in RNA viruses: a quantitative 
phylogenetic analysis. J. Mol. Evol. 54, 156-165 (2002). 

12. Calvignac-Spencer S., et al. Clock rooting further demonstrates that Guinea 2014 
EBOV is a member of the Zaire lineage. PLoS Curr. http://dx.doi.org/10.1371/ 
currents.outbreaks.cOe035c86d721668a6ad7353f7f6fe86 (2014). 

13. Carroll, S.A. et a/. Molecular evolution of viruses of the family Filoviridae based on 
97 whole-genome sequences. J. Virol. 87, 2608-2616 (2013). 

14. Li, Y.H. & Chen. S. P. Evolutionary history of Ebola virus. Epidemiol. Infect. 142, 
1138-1145 (2014). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank P. Lemey and S. Ho for technical assistance. This work is 
partially supported by the special project of Ebola virus research from the President 
Foundation of Chinese Academy of Sciences. It was also supported by grants from the 
China Mega-Project on Infectious Disease Prevention (nos 2013ZX10004202-002, 
2013ZX10004605), China Mega-Project on Major Drug Development (no. 
2013ZX09304101) and the National Hi-Tech Research and Development (863) 
Program of China (nos 2014AA021402, 2014AA021501). We thank the government 
of Sierra Leone, the Sierra Leone Ministry of Health and Sanitation and the Chinese 
National Health and Family Planning Commission. We also thank the medical workers 
and volunteers in Sierra Leone. G.F.G. is a leading principal investigator of Innovative 
Research Group of the National Natural Science Foundation of China, NSFC) 

(grant no. 81321063). 


Author Contributions The manuscript was written by Y.-G.T., W.-F.S., D.L, G.F.G. 

and W.-C.C. Samples were collected by J.Q., D.K., F.D., A.K., B.K., Y.S., H.-J.L, X.-G.Z,, F-Y., 
Y.H., Y.-X.C., Y.-Q.D., H.-XS., Y.S., W.-S.L, Z.W., C.-Y.W., Z.-Y.B., Z.-D.G., L.-B.Z., W.-M.N., 
C.-Q.B., C.-H.S., Y.F., Z.-P.X., X.-X.Z., S.-T.Y. and B.L. Experiment and data analysis were 
performed by Y.-G.T., W.-F.S., D.L., H.F., M.N., H.-G.R., J.L, YJ., Y.T., Z.L, C.-C.C., Z.-H.L, 
H.J., Y.L, X.-P.A., P.-S.X., X-L-LZ., Y.H., Z.-Q.M., D.Y., H.-W.Y,, J.-F.J., X-C.B., LL, F.-C.H. 
and W.-C.C. The study was designed by B.K., X.-C.B., L.L, J.Q., F.-C.H., G.F.G. and W.-C.C. 


Author Information The 175 newly sequenced genomes have been submitted to 
GenBank. The accession numbers are provided in Supplementary Table 1. Reprints 
and permissions information is available at www.nature.com/reprints. The authors 
declare no competing financial interests. Readers are welcome to comment on the 
online version of the paper. Correspondence and requests for materials should be 
addressed to W.-C.C. (caowc@brmi.ac.cn), G.F.G. (gaof@im.ac.cn) or F.-C.H. 
(hefc@nic.bmi.ac.cn). 


The China Mobile Laboratory Testing Team in Sierra Leone 


Yi-Gang Tong?, Jun Qian’, Yang Sun, Hui-Jun Lu?, Xiao-Guang Zhang®, Fan Yang‘, Yi 
Hu?, Yu-Xi Cao’, Yong-Qiang Deng', Hao-Xiang Su*, Yu Sun?, Wen-Sen Liu, Zhuang 
Wang?, Cheng-Yu Wang’, Zhao-Yang Bu?, Zhen-Dong Guo%, Liu-Bo Zhang®, Wei-Min 
Nie®, Chang-Qing Bai’, Chun-Hua Sun!, Yong Feng®, Jia-Fu Jiang! & George F.Gao??1° 


lState Key Laboratory of Pathogen and Biosecurity, Beijing 100071, China. *Key 
Laboratory of Jilin Province for Zoonosis Prevention and Control, Changchun 130122, 
China. *Institute for Viral Disease Control and Prevention, Chinese Center for Disease 
Control and Prevention, Beijing 102206, China. “Chinese Academy of Medical Sciences & 
Peking Union Medical College, Beijing 100730, China. “Institute of Environmental Health 
and Related Product Safety, Chinese Center for Disease Control and Prevention, Beijing 
100021, China. °The No. 302 Hospital, Beijing 100039, China. 7The No. 307 Hospital, 
Beijing 100071, China. Department of international cooperation, National Health and 
Family Planning Commission, Beijing 100044, China. °Institute of Microbiology, Chinese 
Academy of Sciences, Beijing 100101, China. !°Chinese Center for Disease Control and 
Prevention, Beijing 102206, China. 


©2015 Macmillan Publishers Limited. All rights reserved 


METHODS 


Ethics statement. This work was conducted as part of the surveillance and public 
health response to contain the EVD outbreak in Sierra Leone. Blood samples from 
suspected individuals and oropharyngeal swab samples from corpses were col- 
lected for EVD testing and outbreak surveillance with a waiver to provide a written 
informed consent during the EVD outbreak under the agreement between the 
Sierra Leone government and Chinese government. The activities were coordi- 
nated by the Emergency Operations Centre in the charge of Sierra Leone Ministry 
of Health and Sanitation and WHO. All the information regarding individual 
persons has been anonymized in the report. 

Genome sequencing and assembly. RNA samples extracted from whole blood 
from 175 EVD patients were reverse transcribed to cDNA. PCR amplifications 
were performed with EBOV-specific primer pairs with overlaps. Amplicons from 
one patient were pooled for library preparation. Next generation sequencing (NGS) 
was performed using the BGISEQ-100 (Ion Proton) platform. All the sequenced 
reads were filtered to remove the low quality and short reads. The genome 
sequences of the viruses were assembled by mapping the filtered reads to the 
2014 EBOV consensus sequence using Roche 454 Newbler version 2.9 (Roche), 
and the mutation site was manually checked with original sequencing data. 
Phylogenetic and phylogeographic reconstruction. All previously published 
EBOV genome sequences and our newly released 175 sequences were aligned 
using MAFFT v7.058"°. Phylogenetic analyses were performed using MrBayes*® 
v3.2 (10 million generations) and RAXML v8.1.6 (1000 bootstrap replicates), with 
the GTR model of nucleotide substitution and y-distributed rates among sites. 
Phylogeographic reconstruction of the 2014 EBOV was estimated using BEAST 
v1.8.0", with a continuous time Markov Chain (CTMC) over discrete sampling 
locations. The 175 newly sequenced samples in this paper were grouped into 7 
regions (Waterloo, Freetown, Rest of Western, Maforki Chiefdom, Rest of Port 
Loko, Bombali and Kambia). Bayesian Markov chain Monte Carlo analysis was 
run for 100 million steps, 10% of which were removed as burn-in and sampled 
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every 10,000 steps. Bayes factor tests were performed to provide statistical support 
for potential transmission routes between different geographic locations using 
SPREAD v1.0.6"°. Bayes factors for rates were derived from a Bayesian stochastic 
search variable selection procedure. The phylogeographic linkage was constructed 
by routes with Bayes factor values >3. 

Substitution rates and population dynamics. The substitution rates were esti- 
mated using Bayesian Markov chain Monte Carlo (MCMC) as implemented 
in BEAST v1.8.0. In this analysis, two data sets were compiled, with one 
including all the 2014 EBOV sequences and the other including sequences from 
September to November, 2014. We performed two independent runs for 100 
million generations, sampling every 10,000 steps. In addition, to accurately estim- 
ate the substitution rate, we repeated this analysis using a previously described 
data set using the same parameters. Population dynamics of the 2014 EBOV in 
Sierra Leone was estimated using a flexible non-parametric Bayesian skyride 
model” incorporated in BEAST v1.8.0, with the HKY+I° model and a strict 
molecular clock. 

Molecular characterizations of the 2014 EBOV. SNPs were called directly 
from the sequence alignment using the CLC Genomic Workbench v7.5.1, 
GeneiousR8 and Newbler v2.9. The earliest strain of EBOV 2014, H.sapiens-wt/ 
GIN/2014/Makona-Kissidougou-C15 (GenBank accession number KJ660346.2) 
was used as the reference genome. The synonymous substitutions, non-synonymous 
substitutions, and substitutions in non-coding regions were marked with 
coloured dots. 


15. Katoh, K.& Standley, D.M. MAFFT multiple sequence alignment software version 
7: improvements in performance and usability. Mol. Biol. Evol. 30, 772-780 
(2013). 

16. _Bielejec, F. et a. SPREAD: spatial phylogenetic reconstruction of evolutionary 
dynamics. Bioinformatics 27, 2910-2912 (2011). 

17. Minin, V.N. et al. Smooth skyride through a rough skyline: Bayesian coalescent- 
based inference of population dynamics. Mol. Biol. Evol. 25, 1459-1471 (2008). 


©2015 Macmillan Publishers Limited. All rights reserved 


100 


LETTER 


K850347_Gueckodey- C07, 2014-0320, GIN-Dstic-Town 
11660346 Kissdougou-O1§ 2014-05-17 “GIN Diet 


100 


1K1660348_Gueckedou-C05_2014.03-19_GIN-Distict-Town 


134550 EMDES_ 2014.05.25 SLE-Kallahun-KissiTeng 
KM034855-G3676-2_2074-06-08 SLE-Kallahun-Kiss) 
034589. Ga680, 1_ 2014.05.28 SLE-Kallahun-KissiTeng 
RMOGHS, 3685 7_2014.05-28 SLE Kallahun-Kiaaifong 
‘KM034582 Gana 1” 2074.05.28 SLE-Kallahun-KissiTong) 
WOMORHSeS C3887 T2014 0598 SLE Kalahun Kestrong 
KMO034952 EMO9S 2014.05.26 SLE-Kallahun-KissiTon 


100 


m9 


-KMos4688 i670, 1 2014-08-27 SLE-Kalahun-KissiTeng 


KMOs4657_G3677 2. 2014-05-27 SLE-Kalanun-KissiTeng 
KM034860"G3682~1"2014-05.28 SLE-Kallanun KissiTeng 


KM235043 EM120_ 2014.08.09 SLE-Kalahun-Jawe 
M2304 EM127- 2014.06.04. SLE-Kalahun-KissiKama 


Kinzaa0s4_ G3729 2014.06.07 SLE-Kalahun-lawle 
KM233055 637341 2014-06.07 SLE-Kallahun-Luawa 
KM233062_G3758_ 2014-06-11. SLE-Kalanun-Jawie 


~Khzaa065 “63769. 1-2014-06-12_SLE-Kalahun-Jawie 


EURSBOT2 Gara. 2014-00-14 SLE-Kalahun Kissing 
1M233073-G3786. 


2014-06 14SLE-Kallahun-KissiTeng 
KM233075 G3788_2014.06-14_ SLE-Kallahun-KissiTeng 


KM239077 43799 2014-06-15. SLE-Kalahun-lawie 
KM233078-G3796 2014.06.18 SLE-Kalahun-datwle 
W208 


(Ga799_ 2014-06-15. SLE-KalahunJawie 


KM235081_G3800 2014-06-15 SLE-Kalahun-Jawie 
KM233084"G3807 2014-06-15 "SLE-Kalanun-Jawle 
2300857 GoB08) 2034-06-15, SLE Kalan ie 


(23308838102 2014-06-17 _SLE-Kallahun-Jawle 


¥eu250004 G5RD0 2014-06 13 SLE Ralanun ine 
‘KW233097 


(63823 2014-06-15. SLE-Kallahun-Jawie 


KM239108 G3838 2014-06-17 SLE-Kalfhunlawie 
KM233107-G3841-2014.06-17-SLE-Bo-Kakua, 


100 


KMO34551_EM096_2014.05-26 SLE-Kallahun-KissiTeng 
Kl034886-G3679 "1 2014-05-28 SLE-Kalanun-KissiTeng 
khz33074-Gs787-2014.06-14_ SLE-Kalanun-KissiTeng 

1014-06-16 SLE-Kalanun-kissiTeng 

40158 1. 2014-11.07 SlertaLeone-WestemUrban-Freetown 
(Kh2aa035, ENT04 2014-06-02 SLEKalanun-Jawie® 

iKwza3036 ENt106 2014-0602 SLE-Kalahun-Jawe 
233087 “EM 10 "2014-06-03 "SLE-Kallahun-Jawie 
KM233038_ EM111_ 2014-06-03 SLE-Kalanun-Jawie 
Kiza309¢- E112" 2014-0605 SLE KatanunJawe 


KM233041-EM 18-2014.06.03-SLE- 
KW233047-EM124 3 2014-06-08 SLE-Kalahun-Jawie 
N233049- G3707~2014.06.06_SCE-Kallahun-Jame 

KM233050_G3713 2-2014-06.09- SLE-Kalahun-Njaluahun 

kh2a3083 63724 2014-06-05 SLE-Kallanun-lawie 
KW233056, G3735 1 2014.06.07 SLE-Kallahun-Jawie 

IkM233058"G37501- 2014-08-10. SLE-Kalahun-Jawie 
iKM233063_Ga7e4 2014-06-12" SLE-Disviet Town 
KM2a3064 “G3765-2 2014-06-14. SLE-Kallahun-Jawie 
KM233071_ G37 2014-06-12. SLE-Kalahun-Mandu 
KW233076-G3769"1 2014-06-14, SLE-Kalahun-Javie 
KW233079-G3798~2014-06.15. SLE-Kallahun-lavie, 
Kh233082~Gan05-1_2014-06-T5_ SLE-Kalahun-Javie 
KWw233089-G3814_2014.06-15, SLE-Kallahun-Jawie. 

KNi233081 G3817"2014.06-15_SLE-Kallahun-Jawie 
‘KW239092 G3818, 2014-06-15 SLE-Kalldhun-Jawle 

Kin2s3008 Gaa19 2014-06-45. SLE-Kaifahun-awie 
1KN233106_G3B40 9014-06-17. SLE-Kallahun-Jawe 
K§233108 “Gi845 "2014-06-18 SLE-Kalahun-Javie 

~kM233109 {Gasa8. 2014-00-18. SLE-Kallahun-Jawle 

KW233111 G9850_2014-06-16" SLE-Kalahun-Jawie 

KM233173 3856 1 2014-06-18. SLE-Kenema-Nongowa 
KW2a31 16_NMO42_T.20"4-06:04 SLE-Kambia-Mambole. 

KMg39042 EM119 2014-05-03 SLE-Kallahun-Jawie 

kN233070-G3770-2.2014.06-TaSLE-Kalahun-Jawie 
K¥233700_G3826_ 2014-06-16 SLE-Kalahin-Jawie 


kM233096_G3622-2014.06-18- SLE! 


lah ay 
“S0081-B20414-10.06 SierraLeone-WestenUrban-Freetown 
"J0083. T- 2014.10.06. STerraleone-WestemRural- Waterioo 
0084 T_2014-10-06_ Sierra eone-WesternRural-Waterioa 
100 ‘40082 8 2014.10-25 ‘Sierra eone-WesternRural-Waterloo 
0099.7 2014-10-28, SierraLeone-Kambia-Kambia . 
“0137 T 2014-11-01 Sierral eone-WestemRural-Waterloo 
7 2014-10-30 SierraLeone-WesternLrban-Freelown 
‘40007 8 2014.09-27 ‘Sieral.eone-WesternRural-Coldbath 
“Jo008_T_2014-09:28. SierraLedne-WesteraUtban-Freeiowm 
“J0016, 7 2014-08-0. SirvaL cone. WestemRural-Waterloo 
"30080_T_2014-70205. SieraLedne-WesternRural-Coldbath 
‘40068-T-2014-10-10~SiertaLeone-WestamnUrban-Adonkia 
“J0089.T- 2014-10-22. Sierral eone-WestemRural-Coldbath 
“J01 12. 2014-10-31, SiertaLeone-WestemUrban-Freetown 
“T01s4_T_2014-10-31_SierraLeone-WestemRural-Waterioa ~ 
40145. B 2014-11-02. SierraLeone-WesternRuural-Waterloo 


os 


100 


1 


40024"T- 2014-08-29" Sierval eone-WestemRural-Waterloo 
‘Joa2aT” 2014-10-02. Sieral-eone-Bombal:-Mapema 


400018 2014.09.27 SierraLeone-WesternRural-Jui 
‘40042. T-2014-10.04" Sierval eone-WesternRural Waterloo 
30008. 6. 2044-08-28  SierraLeone-WesternRuralJul 
"0121, T-2014-10-29. SiertaLeone-WestemUroan-Freetown 
10178°872014-1 1-10 “SierraLeone-WesternRural Sulpon 
are NO, 8 2014-11-08 SionaL.oone-WestemUrbanFrectown 


2014-11-06, SierraLeane-Western\tban-Fr 


town 


a 


40039 _8 2014-10-03 Sierral eone-Portloko-MaforkiChietdom 
‘Ja158-B-201411.06~SlerraLeone.WWestomUrban-Freetown 
40159. 82014-1105" SierraLeone-WesternRural. Waterloo 

“J01a9.T 2014-11-06. SieraL eone-WesternUtban-Freetown 
“40002 8 2014-0927 Sierral eone-WesiernUrban-Freetown 
‘J0047-72014-10.04~SierraLeone. WesternUrban-Freetown 


‘01s4 B 2014-11-03. Sierraleone-Kambia-Kambia 


fa 


Slerral cone WestemUrban-Freetown 
SierraLeone-WestomUrban-Frestown, 


J0143_B 2014-11-03, SierraLeone-Kambla-Kambia 


40171 T 2014-11-10. Sirral cone WesternRural-Allen 


0174277201411 10-SierraLeane-WesternRural-Waterloo 


10024, T_ 2014.09.28 Sierral eone-Bomball-Kambia 

Kn233061.-G5752_2014-06-10_ SLE-Kallahun-KpejeBongre 

¥233099" 3825-2. 2014-06-17 SLE-Kallahun-Malema 
‘qW2a3102, Gaa79_ 2014-06-16, SLE-Kallahun-Luawa 

KW239108 G3834_2014-06°97 SLE- 

M2331 10-Gap4a~2014-06-18-SLE- 

W233112-G3B51-2014-06-18_SLE. 


fenema-Nongowa 
M2331 15-G3857~2014-08-18_SLE-Kallahun-Luawa, 


40123. T 2014-11-01 Siertal_eone-WesternRural-Waterfoo 
"Woi52,B 2014-11-06. SierraLeone- WesternRural-Waterloo 
40160-20141. 


5 Sierral eone-WesternRural-Waterioo 
440015 _T_2014.09.26  Sierval eone-Bomball-Kambia 
93 ~J0017 T- 2014.09.26. SierraLeone.Bomball-Kambla 
“0079. T_2014-09-25. Sierval eone-Bomball-Kamba 
J0020-T 2014.08.25. Siertal.eane-Bomball-Kambia 


“J0022_T_2014-09.25.SievvaLeone-Bombal-Kamba 
.J0096_8_2014-10.27_SierraLeone-WesterUtban-Fresiown 
‘J0078_B 2014-09-30, SjerraL one WesieraRural-Haslings 
99 99 ‘40061. 2014-10-09 SiertaLeane-PortLoko-MarampaChiefdom 


“i073 2014-1011 ‘Siortaleone-PorlLoko-MarampaChiefdom 
‘J0142_ 7 2014-11-03 SertaLeone-WestemUrban-Freetown 
40114 B 2014-1028. Sierral eone-Poroko-MarampaChietsor 
‘0115-8 2014.10.26" SierraLeone-PortLoko-MarampaChiefdom 
“Ja025 T~2014-09-29 SierraL eane-PoriLaKo-Maforkichietaam 
“J0164.5 2014-1 1:08 SierraLeone-WesternRural-Coldbath 
“Joot1"8. 2074-09-29 SierraLeone-Porloka-Mayonkal 
“30080_B 2014-10-03. SierfaLeone-Portoko-Maforkichlefdom 
‘J0180.8. 2014-10-30, SierraLeone.WestemRural-Waterioo 


| 324 


a 


110012 8 2014.09.20. SierraLeone-Portloko-MaforkiChiefiom 


2074-10-06  SierraLeone-PoriLoko-Mlaforkichietdom 
‘J0163T 2014-11-08. Sieral cane Port oko-Koyachiefdom 
‘$0013 'B 2014-08-28" SlerraLeone-Pori oko-MarampaChiefdom 
0014. T_2014-08-29 SierraL eone-Portoko-talap 
0048.7 2014-10-08. SlerraLeone-WestemRural-Waterioo 
‘400888 2014.10-20- SierraLeane-PortLoko-KoyaChiefdom 
JOITE_B_2014-10.1 SiowaLoone-WestemRval Wataioo 


0088. 


100 
“10085 16-20%4-10-17 Serra one-Porl-oko-Marampactetdom 


J0301-B 2014-10-29 SiraLeone-PortL oko, MasimeraChiefdom 
‘30003"B-2014-08-28" SierraLeane.Porloka- Masimerachietdom 


110032" T_2014-09-30. SlerraLeone.PortLoko-MarampaChiefdom 


0074 T_2014-10.3, SierraLeane- WesternRural-Waterloo 
“Jo14i_T 2014-11-03. SierraLeone-WesternRural-Johntop 


2014-09-39 Seal apne-Porl 


70038-t ‘MaforkiChiaidom 


01038 2014-70-39 Sieral eane-PortLoko-Kambia 


“0108, T-201411-10_SierraL eone-WesternRural-W: 


"102. B.2014-10-27 SlerraLeone-ParlLoko-Kambla 


0104. T,2014-10-29. Sierra cone-PortLako-MasimeraChiefdom 
TOTOS_T_2014-10.29 Sierral eane-Portl oko-MaforkiChiefaam 


40044 T_2014-10.05 Sierra eone-WestemRural-Waterloo 
‘J0047"B_2014-10.05~ SierraLeane-WesternUsban-Freetown 
"J0124.T-2014-11-0% SierraLeone-WesternUtban-Freetown 
901328 2014-10-01. SierraLeone-WesiemUrban-Freetown 
0087 2014-10-22. Sieral-eone-Porlloke-MasimeraChiotdom 
‘J007B_T. 2014-10-17 Sierral gone. WesternUtban-Freetown 
40086, 8 2014-10-22. SierraLeone-WesternRural-Jul 
“Toss 1 2074-10-04. ‘SierfaL eone-WesiemUrban-Freetown 
99 ‘30080-12014 10-17 SIertaLeone-PortLoko-MaforkiChletdam 
‘J0093.T 2014.10.24. Sieraleone-WesternRural-Jahntop 
"30196. T 2074.17.01 SiertaLeone.WestamRural Rokel 
J0167,8 2074-11-08, SierfaLeone-WestemnUrban Freeiown 
“o0s9.T 2074.10.07 SiarraL eone-WestemRural-Keniluncton 
“40060_1" 2014-10-06. Sieral ear 
‘30065-12014. 10.09-SerraLeone-WesternUtban-Freetown 
ag. 2014-10-07 SieraLeone-WesterRural-Rokel 


‘Weste 
101286 2014-11-01, SierraLeone-WestemUrban-Freetown 
OTST _B 2014-11-06, SieraLeane-WesternRural-Coldbath 


“J0183_B 2014-11-06, Slerral eone.WesternRural Walerioo 
T0728 2014-11-10. SieraCoone-WesternRural-Hastings 


“Too04T-2074-09.28. Siaral eone-WesternRural-Cole 
“J0006..8. 2014.09.28. SieraLeone-WesternRural-Cole 


“J0069.T 2074-10.09. Sierral cone-WestemRural-Waterloo 


3.2.4 


‘J01708 2014-11-10. Sierra eone-WesternUtban-Freetown 
“Joo91-1-20 14-10-25" SierraLeone-WesternUrban-Freetown 

‘S018 8 2014-11-06  SlerraLeone-WestemRural-Waterioo 
"30028 1 2074-10-02. Sierral eane-Bomball-Kamakoni 


Jodge.T-2014.10.28, Serral or 


jastarnRural-Cole. 


“10408, T, 2014-10-28. SieraLeone-WesternRural-Waterloo 


0176. 
4J0045_T_2014-10.05_ SievraLeone. WestemUrban-Freetown, 


}_20%4-10-30_ SierraLeane-WesternRural-KentJunction 


‘O0E4 2014-1009. Sieral eone-WestemUrban-Freetown 


‘30068-T~-2014-10.08-SierraLeon 


10009 T_2014-09-28 SierraLeane-PortLoko-Rosint 
00708 2014-08°38, SierraLedne-PoAl oke-MaforkiChiotdom 
‘30028 B. 2014-10-01 SierraLeone-WesternRural-Waterloo 

Jo027 2014-10-02 Sierra sone-Bomball-Makure 

0081-8 2014-F0.03_SieraleGne-Portloko-MaforkiChiefdom 
“40085. T- 2014-10-06 Sierral.cone-PortLoko-MaforkiChiefdom 

30962 T” 2014-10208 ‘Sietal-sone-PorLoko-Romen! 

0083 B 2014-10-09. Sieralone- Port oko-MaforkiChietdam 
J0T09B 2014-70-31_SierraLeane-WestermUrban-Frealown 

J0162_B 2014-11-09 SievtaLeone-Kambia-Kambia 


“30084 B 2014-09-30, SierraLeone-PorlLoko-Kovachlatdom 
‘30067—T-2014-10.09SierraLeone-PortLoko-MaforkiChietdom 
“0130.7 2014-11-01 SiertaLeone-PonLok 
{013117-2014-10-307SertaLeone-PorLoko- 
“0166. T-2014-11-08  SirraL eone-WestemUrban Freetown, 
Jo175_T_ 2014-11-10. SirraLeone-WesternR 
‘J0037-B. 2014-10-03, SlerraLeone-Porlioko-Rosint 


jastarnRural- Waterloo 
“J0076."T_2074-10-17- SierraLeane-WesternRural-Waterloo 


0128 
T0067 _B_2014-10.05_SiraLeone-PorLake-MatorGhetdom, 


100 


“W01S1_T 2014-11-07 Sierraleone WestemUrban-Freeiown 


Jon2s_T_20140-02_SeraLeone- Westman Fretown 


"W0126 T, 2014-70-31. SierraLeane 


festemRural-CalabaTown, 
“J0107_T_2014-10.29 SierraLeone-WestemUrban-Freetown 


J0127_T 2014-10-30_SiertaLeone-WesterUtban-Freetown 
7 2014-10-31 SierraLeone.WesiernRurarWaterloo 
'J0058. 8. 2014-10-07. SierraLeone-PortLoko-KoyaChiefdom 


'2014-10-30_SierraLeone-WestemUtoan-Freetown 


(0-18, SirraL eone-WesternRural-York 
starnLoan Freetown 
‘SieraLeone-WestemRural-Rogbangba 
festornRural-Rogbangba 


32:0 


0138 T_2014-11-01_Sierral eane-WestemUrban-Freetown 


Jo145-172014-11-05~SierraLeone: 


.J0095_8_2014-10.27_SierraLeone-WesternUtban-Freeiown 
“J0758 T2014. 11-06, SiertaLeone.-WastarnRural-Hastings 
‘J6110. 72014-1031 Slerral.cane.WesternLUtban-Freetown 


temnRural- cole 


“W041. 2018-10-30. Sieral-eone-WesternUtban-Freelown 


‘T0087 T2014 10-27 SierraLeone-WestemRural-Wateriog 
‘JOTTT_B_2014-70:30 Sierra eone-WestemUrban-Freetown 
70120. 2014-10.30. SieraL eone-WesiemUrban-Freetown 
40188. 7- 2014-11-07 Sierral cone WesternRural-Graften 


‘J0Da3. T 2014-10-03. SieraLeane.Portioko-MaforkiChiefdom 


oOSE_T_2014-F0.03.Sierral eane-Porloko-MaforkiChiefdom 
Joa7ar-2018-1 
0072 T_2014-10:209. SiervaL cane Poritoko-BuyaRomenseChietdom 


}-08 “Sierra eone-PortLoko-BuyaRtomendeChiefdom 


Jt abi4-10'07 Sierra eone-Portoko-Matorkiieldom 


135, 
“Jooase-doreciet 
‘Jooe 


8.0E-5 


Extended Data Figure 1 | Phylogenetic tree of the 2014 EBOV inferred using MrBayes. The seven novel sublineages are highlighted using different colours. 
Previously described EBOV sequences are shown in grey. Posterior probability for each lineage is shown. 
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Extended Data Figure 2 | Maximum likelihood tree of 2014 EBOV constructed using RAxML. 
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Extended Data Figure 3 | Phylogeographic inference of the 2014 EBOV using BEAST. Previously described EBOV sequences are shown in grey. Posterior 


probability for each lineage is shown. 
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GGTCTATTGATTGTCAAAACAGTACTTGATCATATCCTACAAAAGACAGAACGAGGAGT TCGTCTCCATCCTCT TGCAAGGACCGCCAAGGTA 


GGTCTATTGATTGTCAAAACAGTACTTGATCATATCCTACAAAAGACAGAACGAGGAGT TCGEC TCCATCCECT TGCAAGGACCGCCAAGGTA 


AAAAATGAGGTGAACTCCT TCAAGGCTGCACTCAGCTCCCTGGCCAAGCATGGAGAGTATGCTCCTTTCGCCCGACTTTTGAACCTTTCTGG 
AAAAATGAGGTGAACTCCTTCAAGGCTGCACTCAGCTCCCTGGCCAAGCATGGAGAGTATGCTCCETTCGCCCGACTETTGAACCTTTCTGG 


AGTAAATAATCTTGAGCATGGTCTTTTCCCTCAACTGTCGGCAAT TGCACTCGGAGT CGCCACAGCCCACGGGAGCACCCTCGCAGGAGTAA 


AGTAAACAATCTTGAGCATGGTCTTTTCCCTCAACTGTCGGCAAT TGCACTCGGAGT CGCCACAGCCCACGGGAGCACCCTCGCAGGAGTAA 
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AAT TGCAATAATTGACTCAGATCCAGTTTTACAGAATCTTCTCAGGGATAGTGATAACATCTT 
AATTGCAATAATTGACTCAGATCCAGTTTTACAGAATCT TCTCAGGGATAGTGAGAACATCET 


Noonan Wanner ln 


TTTAATAATCCGTCTACTAGAAGAGATACTTCTAATTGATCAATATACTAAAGGTGCTTTACACCATTGTCTCTTTTCTCTCCTAAATGTAG 
TEGAATAATCCGTCTACTAGAAGAGATACT TCGAATTGATCAATAGACTAAAGGTGCTT TACACCAGTGTCTCEGETCTCTCCEAAATGTAG 
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TTGGAGT TACAGGTGTTATAAT TGCAGTTATCGCTTTATTCTGTATATGCAAATTTGTCTTTTAGTCTTTCTTCAGATTGTTTCACGGCAAAA 
TTGGAGCCACAGGTGETATAATTGCAGTTATCGCTTTATTCTGTATATGCAAATTTGTCETETAGTCTTTCTTCAGATEGTTTCACGGCAAAA 


9475 


TCAGTACTATAATCACTCTCATTTCAAATTGATAAGATATGCATAATTGCCTTAATATATAAAGAGGTATGATATAACCCAAACAT TGACCAAA 
TCAGTACTATAATCACTCTCATT TCAAATEGACAAGATATGCAGAAT TGCCTCAATATATAAAGAGGTATGATATAACCCAAACAT TGACCAAA 


AWWW d/l nwa nennln 


9660 


GAAAATCATAATCTCGTATCGCTCGCAATATAACCTGCCAAGCATACCTCTTGCACAAAGTGATTCTTGTACACAAATAATGTTTGACTCTA 
GAAAATCATAATCTCGTATCGCTCGCAATATAACCTGCCAAGCATACCTCT TGCACAAAGTGACTCTTGTACACAAACAATGEECTGACTCTA 


Extended Data Figure 4 | Original sequencing results of the serial T > C substitutions using the Sanger method. All of the four regions including serial T > C 
substitutions were sequenced using the Sanger method with the primers provided in Extended Data Table 1. 
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Extended Data Figure 5 | Synonymous and non-synonymous substitutions _ intergenic. b, Gene-specific global dN/dS estimates. The dN/dS and 95% 
of the 2014 EBOV. a, Distribution of synonymous and non-synonymous highest posterior density interval were calculated using HyPhy. c, Lineage- 
substitutions in different lineages. The numbers of substitutions are labelled specific global dN/dS estimates. 

within bars. NS, non-synonymous; S, synonymous; UTR, UTR region; IG, 
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Extended Data Table 1 | Primers designed for the confirmation of the serial T > C substitutions 


Sample ID Target region Primer 

J0150 & JO157 1139-1474 GGACATGATGCCAACGATGC(+) 
ATTTACTCCTGCGAGGGTGC(-) 

J0169 5434-5731 GTCTTCCAGCTGTGGTTGAGA(+) 
AAGATTGACATTTGAATCACCGT(-) 

J0127 7967-8200 TGGTGGACAGGATGGAGACA(+) 
GGCTATGTTTGAAGCTCCAGTG(-) 

J0024 & JO028 9401-9783 CCTTCTACTTGATCACAATACTCCG(+) 


CCTCCTCCACAACTTGAAGCA(-) 
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West Africa is currently witnessing the most extensive Ebola virus 
(EBOV) outbreak so far recorded'~*. Until now, there have been 
27,013 reported cases and 11,134 deaths. The origin of the virus is 
thought to have been a zoonotic transmission from a bat to a two- 
year-old boy in December 2013 (ref. 2). From this index case the 
virus was spread by human-to-human contact throughout Guinea, 
Sierra Leone and Liberia. However, the origin of the particular 
virus in each country and time of transmission is not known and 
currently relies on epidemiological analysis, which may be unre- 
liable owing to the difficulties of obtaining patient information. 
Here we trace the genetic evolution of EBOV in the current out- 
break that has resulted in multiple lineages. Deep sequencing of 
179 patient samples processed by the European Mobile Laboratory, 
the first diagnostics unit to be deployed to the epicentre of the 
outbreak in Guinea, reveals an epidemiological and evolutionary 


°8 Pierre Formenty*’ & Stephan Giinther 


2,14,15% 


history of the epidemic from March 2014 to January 2015. Analysis 
of EBOV genome evolution has also benefited from a similar 
sequencing effort of patient samples from Sierra Leone. Our results 
confirm that the EBOV from Guinea moved into Sierra Leone, 
most likely in April or early May. The viruses of the Guinea/ 
Sierra Leone lineage mixed around June/July 2014. Viral sequences 
covering August, September and October 2014 indicate that this 
lineage evolved independently within Guinea. These data can be 
used in conjunction with epidemiological information to test ret- 
rospectively the effectiveness of control measures, and provides an 
unprecedented window into the evolution of an ongoing viral hae- 
morrhagic fever outbreak. 

We used a deep sequencing approach to gain insight into the evolu- 
tion of Ebola virus (EBOV) in Guinea from the ongoing West African 
outbreak. This was an approach based on analysis pipelines developed 
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Figure 1 | Geographical location, sequence read depth, and read depth vs 
C, value of patient samples. a, Geographical location of patient samples. The 
origin of the sequenced samples (one sample per patient) from Guinea, Sierra 
Leone, and Liberia processed by EMLab Guéckédou are plotted as numbers of 
cases by district. EMLab data are overlaid on an Ebola outbreak distribution 
map where cumulative cases are plotted as a heat map (low (yellow) to high 
(brown)) of confirmed cases from March 2014 to January 2015. Case data 
sourced from World Health Organization (WHO) Ebola response situation 
reports (http://apps.who.int/ebola/en/ebola-situation-reports); Geographic 
Information Systems (GIS) data sourced from Environmental Systems 
Research Institute (ESRI) and Database of Global Administrative Areas 


for a guinea-pig model of EBOV infection and Hendra virus infection 
of human and bat cells*°. Here we use this approach to derive con- 
sensus EBOV genomes from individual patient samples that can be 
used to study viral genome evolution during the course of the outbreak. 
Viral genomes were derived primarily from blood samples that had 
been taken from patients in Guinea and sent to the European Mobile 
Laboratory (EMLab), deployed by the World Health Organisation 
within the Médecins Sans Frontieres Ebola Treatment Centre 
Guéckédou in March 2014 to aid the diagnostic effort. With the per- 
mission of Guinean authorities a biobank of samples was assembled 
which had known provenance of EBOV infection. Linked to each 
sample were the following data: patient location (to district level), 
sample collection date, disease onset and outcome. The collection 
dates were a median of 4 days after the date of onset of symptoms. 
Baseline data was cleaned, formatted and imported into the 
Geographic Information System, ESRI ArcGIS. Statistical tools were 
used to generate tabular output and to join the numeric case data with 
the district level boundaries of Guinea, Liberia and Sierra Leone (dis- 
trict geometries freely available from http://www.gadm.org/) (Fig. 1a). 
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(GADM; http://www.gadm.org/). b, Sequence depth per nucleotide position. 
The number of reads for each nucleotide position was plotted across the 

full length of the virus genome for each of the 179 virus isolates we analysed. 
In red is shown the uniformity of the depth across individual genomes, 
although the median number of reads per nucleotide position had a variation 
spanning over four logy, units. c, Linear regression of the log;y) median 
sequence depth of each virus isolate versus the C; value of the viral load as 
determined by qRT-PCR. Red dots indicate samples taken from patients who 
went on to survive EBOV infection and grey shaded dots are from patients 
who records suggest died from EBOV infection. 


The viral genome sequence was derived from RNA sequencing 
analysis of the patient samples with no pre-amplification of the viral 
genome. In general we selected a range of samples from both males and 
females of different ages and a fair representation of sequences for each 
month (Extended Data Fig. 1), and with C, values less than 20 for 
EBOV RNA. In this selected patient cohort, with a relatively high viral 
load, there was approximately 80% mortality. The read depth mapping 
to the EBOV genome varied between samples and regions in the 
genome (Fig. 1b) and in general the number of sequence reads 
obtained for each genome correlated with the amount of viral load 
as determined by quantitative reverse-transcription PCR (qRT-PCR) 
(Fig. 1c). 

Phylogenetic analysis revealed the dynamic nature of the epidemic 
and molecular change in the viral sequence (Fig. 2a). Several distinct 
lineages were identified, with an initial lineage A (Figs 2a, 3 and 
Extended Data Fig. 2) linked to early Guinean cases dating from 
March 2014 including the three original viruses published by Baize 
et al.”. A second lineage, B, emerged in May and June and comprises all 
the sequences from Gire et al.° and the remainder of those described 
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Figure 2 | Phylogenetic relatedness and nucleotide sequence divergence of 
EBOV isolates from the 2013-2015 outbreak. a, Phylogenetic relatedness of 
EBOV isolates. Phylogenetic tree inferred using MrBayes"' for full-length 
EBOV genomes sequenced from 179 patient samples obtained between March 
2014 and January 2015. Displayed is the majority consensus of 10,000 trees 
sampled from the posterior distribution with mean branch lengths. Posterior 
support is shown for selected key nodes. Twenty-two samples originated in 
Liberia and were collected between March and August 2014 and six samples 
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from Sierra Leone were obtained in June and July 2014. In our analysis we also 
included published sequences, including the three early Guinean sequences” 
and 78 sequences described by Gire et al.°. A number of lineages predominantly 
circulating in Guinea are denoted as GN1-4 along with a uniquely Sierra Leone 
lineage (SL3) recognised in Gire et al.°. b, EBOV nucleotide sequence 
divergence from root of the phylogeny in Fig. 2a plotted against time of 
collection of each virus. The date of the first documented case near Meliandou 
in eastern Guinea is indicated by the red triangle. 
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Figure 3 | A time-scaled 
phylogenetic tree of 262 EBOV 
genomes from Guinea, Sierra 
Leone, Liberia and Mali. Shown is a 
maximum clade credibility tree 
constructed from 10,000 trees 
sampled from the posterior 
distribution with mean node ages. 
Clades described in Gire et al.° are 
identified here (SL1, SL2 and SL3) as 
well as a number of lineages 
predominantly circulating in Guinea 
and posterior probability support is 
given for these. For certain key node 
ages, 95% credible intervals are 
shown by horizontal bars. 
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Figure 4 | Position of non-synonymous amino acid variations in the 179 
genomes analysed in this study compared to a reference sequence taken 
from March 2014 (KJ660346.2). Shown is the frequency of all amino acid 
positions that had variability and the substitution that occurred with the first 
single letter position indicating the reference sequence and the second 
position showing the variation. The percentage frequency in the 179 genomes 
is shown on the y axis. GP, glycoprotein; NP, nucleoprotein; L, RNA 
polymerase; VP, viral protein. 


here. As the epidemic expanded, lineage A remained confined in 
Guinea from March to June 2014, except for one sequence from 18 
July 2014. A single Liberian sequence from March 2014 grouped 
within this lineage. No further EBOV genomes that we sequenced 
from samples taken after July 2014 belonged to lineage A. This clade 
was likely to have been associated with the original outbreak in Guinea 
and was almost successfully contained in May 2014 by the interven- 
tions of the multi-agency response. Two clusters of Sierra Leone 
viruses described by Gire et al.° (denoted by the authors as clusters 
SL1 and SL2), both of which contain later viruses from Guinea and 
Liberia, suggest continued spread across the border during this time. 
Early cases in SL1 and SL2 were both associated with a single funeral’, 
so it is possible that this event may have reignited the epidemic. 
Thereafter, lineage B spread into Guinea, Liberia and Sierra Leone. 
This lineage is associated with the large epidemics in these three coun- 
tries and persisted into 2015. The spatiotemporal spread of these 
viruses based on the phylogenetic analysis presented in Figs 2a and 3 
was summarized (Extended Data Fig. 3) and indicated how the virus 
may have spread between the neighbouring countries. There was no 
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evidence from the data that increases or decreases in mortality were 
associated with any particular virus cluster (Extended Data Fig. 4). 

The Bayesian time-scaled phylogenetic analysis estimated an aver- 
age rate of evolution over the genome of 1.42 X 10° * substitutions per 
site per year with 95% credible intervals of 1.22 X 10 * and 1.62 X 
10°. Details of the model assumptions are given in the Methods 
section. This rate is lower than that initially described for the West 
African outbreak by Gire et al.° but still higher than the long-term, 
between-outbreak rate of 0.8 X 10 * estimated using viruses back to 
the 1976 Yambuku outbreak®. This apparent drop in rate of evolution 
between these two studies is consistent with the explanation provided 
by Gire et al.° that the short sampling interval (March to June) pro- 
vided insufficient time for the action of purifying selection. However, 
the much longer sampling interval in the present study may simply be 
providing a more precise estimate of the rate. It should be noted, 
however, that the between-outbreak rate will exclusively reflect trans- 
mission and evolution that has occurred in the non-human reservoir 
species, so may not be directly comparable to the rate within a human 
outbreak. We observed no evidence of a change in evolutionary rate 
over the course of the epidemic with the accumulation of genetic 
change having a linear relationship with time (Fig. 2b), confirming 
that the apparent decline in rate between the two studies is an obser- 
vational phenomenon’ rather than a change in the virus. 

The estimate of the date of the most recent common ancestor of the 
sampled viruses is mid-January 2014 (95% credible intervals 12 
December 2013, 18 February 2014). Although this is an estimate of 
first transmission event that resulted in more than one lineage in our 
sample, this provides an upper bound on the date of emergence of the 
virus into the human population. This date estimate is consistent with 
the epidemiological tracing of the first suspected cases to December 
20137. 

Given the error-prone nature of EBOV genome replication we 
examined the potential amino acid variation in EBOV proteins from 
the start of our sample collection in March 2014 to January 2015. The 
location of amino acid changes on EBOV proteins and their relative 
representation in the 179 assembled genomes were compared to an 
isolate identified in March 2014 (ref. 2) (Fig. 4). While there is amino 
acid variation in all of the genomes sampled, there were very few 
changes in viral protein 30 (VP30), viral protein 40 (VP40) and viral 
protein 24 (VP24), and these changes are only in less than ~2% of the 
genomes sampled. However, a single amino acid substitution in VP24 
is associated with adaptation to a new host**, and this may be due to 
interactions with host-cell proteins”’®. While some of the variation 
may be attributed to a purely random molecular clock pattern, in 
GP, VP35, NP and L there are some amino acid variations that are 
present in over ~15% of the genomes sampled. For example, in GP 
there is an A to V substitution in ~70.5% of the genomes sampled 
compared to the reference genome. Implications of the mutations 
within GP in relation to immune escape of therapeutics and vaccines 
will need to be assessed in pseudotype neutralization assays using 
EBOV monoclonal antibodies and serum from people who have been 
vaccinated. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. There was no 
randomization or blinding in selection of samples for sequencing. 

Ethics statement. The National Committee of Ethics in Medical Research of 
Guinea approved the use of diagnostic leftover samples and corresponding patient 
data for this study (permit no. 11/CNERS/14). As the samples had been collected 
as part of the public health response to contain the outbreak in Guinea, informed 
consent was not obtained from patients. 

Genome sequencing and consensus building. Viral genome sequence was 
derived from the RNA extracted for diagnostic purposes from blood samples in 
the field with no pre-amplification of the viral genome. These samples were 
processed by the EMLab and are detailed in Supplementary Table 1, which indi- 
cates sample name, geographical location, date of onset of symptoms, date sample 
was collected, and the C, value of EROV RNA at the date of test. The clinical status 
is also indicated as well as malaria co-infection where known. Extracted RNA was 
DNase treated with Turbo DNase (Ambion) using the rigorous protocol. RNA 
sequencing libraries were prepared from the resultant RNA using the Epicentre 
ScriptSeq v2 RNA-Seq Library Preparation Kit. Following 10-15 cycles of amp- 
lification, libraries were purified using AMPure XP beads. Each library was quan- 
tified using Qubit and the size distribution assessed using the Agilent 2100 
Bioanalyzer. These final libraries were pooled in equimolar amounts using the 
Qubit and Bioanalyzer data with 9-10 libraries per pool. The quantity and quality 
of the pool was assessed by Bioanalyzer and subsequently by qPCR using the 
Illumina Library Quantification Kit from Kapa on a Roche Light Cycler 
LC480II according to manufacturer’s instructions. Each pool of libraries was 
sequenced on one lane of a HiSeq2500 at 2 X 125-bp paired-end sequencing with 
v4 chemistry. 

The trimmed fastq files were first aligned to a copy of the human genome using 
Bowtie2 (ref. 12) and the unaligned reads were then mapped with Bowtie2 to a list 
of 3731 known viral genomes excluding EBOV genomes. The reads that were still 
unmapped were then aligned to the EROV genome—either the prototype strain 
isolated in Zaire in 1976 (AF086833.2) or a strain isolated during the current 
outbreak (KJ660348.2). For this step we again used Bowtie2 and the resultant 
alignment files were filtered with samtools to remove unmapped reads and reads 
with a mapping quality score below 11, followed by filtering with markdup to 
remove PCR duplicates. The resultant BAM file was then analysed by 
Quasirecomb”’ to generate a phred-weighted table of nucleotide frequencies which 
were parsed with a custom perl script to generate a consensus genome in fasta 
format. This consensus genome was then used as a reference genome to which we 
remapped the sequence reads which did not map to the human genome or other 


viruses in order to generate a second consensus. In this way we were able to 
manually determine if the reference genome used by Bowtie2 influenced the 
process of calling a consensus genome. In addition, we used FreeBayes to inde- 
pendently call and identify SNPs and indels. The pipeline is entirely open source 
and implemented in the Galaxy environment", a Galaxy compatible workflow, 
novel scripts and XML wrappers needed for implementation in Galaxy are freely 
available and included in Supplementary Data File 1. Sequence alignment maps 
were manually inspected and curated over regions with consistent low coverage 
(for example, at the 5’ ends). 

Phylogenetic analysis. Phylogenetic analysis comprised the 179 EBOV genomes 
from this study, 78 genomes from Sierra Leone’, three sequences from Guinea’ 
and two sampled from Mali’*. The genomes were partitioned into four sets of 
sites—1st, 2nd and 3rd codon positions of the protein-coding regions and the non- 
coding intergenic regions—with each partition being assigned a generalized time 
reversible substitution model'*, gamma distributed rate heterogeneity'’ and a 
relative rate of evolution. This model was used to construct a Bayesian nucleotide 
divergence tree (Fig. 2) using MrBayes'' and a time-scaled phylogenetic analysis 
(Fig. 3) using BEAST'* with a log-normal distributed relaxed molecular clock”, 
and the ‘Skygrid’ non-parametric coalescent tree prior’. The alignments and 
control files for both analyses are available in Supplementary Data Files 2 and 3 
and provide documentation of all model parameters. 
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Extended Data Figure 1 | Spatial and temporal location of patient samples. April 2014, 14; May 2014, 14; June 2014, 22; July 2014, 16; August 2014, 19; 
Geographical locations of sequenced samples are plotted by district as September 2014, 18; October 2014, 21; November 2014, 11; December 2014, 22; 
panels for each month of collection (March 2014-January 2015). Inbrief,the January 2015, 11. Total number of samples sequenced, 179. 

number of samples obtained for each month was as follows: March 2014, 11; 
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Extended Data Figure 2 | Enlarged view of phylogenetic tree presented in Fig. 3. Posterior support shown where >0.5. 
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Extended Data Figure 3 | Temporal spread of EBOV based on phylogenetic 
analyses in Figs 2a and 3. Colour scheme is as follows: Guinea is red/blue (1st 
half/2nd half of 2014, respectively), Sierra Leone is grey-black, Liberia is green, 
Mali is brown. Lineage A (A) is associated with the initial focus of the outbreak 
(Guéckédou, Macenta and Kissidougou) in March 2014, expanded around this 
area and then declined around July 2014. From lineage A a second lineage (B) 


emerged in May/June 2014 and expanded into Sierra Leone (end of May 2014) 
and Liberia (small arrow). Lineage B continued to spread into Sierra Leone, 
Liberia, and further into Guinea (beyond the original focus into most districts of 
Guinea). EBOV disease entered Mali from Guinea via two separate routes 
(from the Beyla district (possibly originally from Kissidougou) in October 2014 
and from the Siguiri district in November 2014). 
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sequences available for GN1 were collected during the period of March—-July 
2014 and the sequences available for GN2 were collected during the period of 
August 2014—January 2015. Red dots indicate survivors. 


Extended Data Figure 4 | Survival rate amongst individuals with known 
EBOV sequences. The total survival rate for the 179 sequenced virus isolates 
included in this study is presented, as is the survival rate for two sub-lineages, 
GN1 and GN2, as defined by phylogenetic inference in Figs 2a and 3. The 
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Distinct lineages of Ebola virus in Guinea during the 
2014 West African epidemic 
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An epidemic of Ebola virus disease of unprecedented scale has been 
ongoing for more than a year in West Africa. As of 29 April 2015, 
there have been 26,277 reported total cases (of which 14,895 have 
been laboratory confirmed) resulting in 10,899 deaths’. The source 
of the outbreak was traced to the prefecture of Guéeckédou in the 
forested region of southeastern Guinea~’. The virus later spread to 
the capital, Conakry, and to the neighbouring countries of Sierra 
Leone, Liberia, Nigeria, Senegal and Mali’. In March 2014, when the 
first cases were detected in Conakry, the Institut Pasteur of Dakar, 
Senegal, deployed a mobile laboratory in Donka hospital to provide 
diagnostic services to the greater Conakry urban area and other 
regions of Guinea. Through this process we sampled 85 Ebola 
viruses (EBOV) from patients infected from July to November 
2014, and report their full genome sequences here. Phylogenetic 
analysis reveals the sustained transmission of three distinct viral 
lineages co-circulating in Guinea, including the urban setting of 
Conakry and its surroundings. One lineage is unique to Guinea 
and closely related to the earliest sampled viruses of the epidemic. 
A second lineage contains viruses probably reintroduced from 
neighbouring Sierra Leone on multiple occasions, while a third lin- 
eage later spread from Guinea to Mali. Each lineage is defined by 
multiple mutations, including non-synonymous changes in the vir- 
ion protein 35 (VP35), glycoprotein (GP) and RNA-dependent RNA 
polymerase (L) proteins. The viral GP is characterized by a glycosy- 
lation site modification and mutations in the mucin-like domain 
that could modify the outer shape of the virion. These data illustrate 
the ongoing ability of EBOV to develop lineage-specific and poten- 
tially phenotypically important variation. 

We combined our 85 Guinean EBOV sequences (Extended Data 
Table 1) with 110 publicly available 2014 EBOV genome sequences 
sampled from Guinea, Mali and Sierra Leone, producing a total data 
set of 195 sequences. Phylogenetic analysis reveals greater genetic 
diversity than previously described, with the presence of three distinct 
lineages, in contrast to the relatively limited variation documented early 
in the Sierra Leone outbreak’ (Fig. 1). The first lineage (denoted GUI-1) 
represents a cluster of sequences only found in Guinea, although from 
all urban and rural regions sampled in this country, and that is most 
closely related to the earliest viruses sampled in March 2014 (ref. 2). 
This lineage co-circulated in the greater Conakry region with viruses of 
the remaining two lineages described below. Notably, GUI-1 is char- 
acterized by multiple non-synonymous mutations in the nucleoprotein 
(NP), VP35 and GP such that it may also be phenotypically distinct, 
although this will require future experimental verification (Fig. 2a). 


These data also reveal that EBOV sequences from the two docu- 
mented introductions into Mali (October and November 2014) belong 
to another larger cluster of Guinean viruses, denoted here as GUI-2. 
This phylogenetically distinct lineage is most closely related to the 
second cluster of Sierra Leone sequences (SLE-2), and could represent 
either a reintroduction from Sierra Leone or the continued diffusion in 
Guinea of strains related to those initially introduced to Sierra Leone. 
Finally, a third cluster of viruses (SLE-GUI-3) is found in Conakry, 
Forécariah, Dalaba and to a limited extent in Coyah (Fig. 2b), with 
multiple sequences falling within the third cluster* of Sierra Leonean 
sequences. Such a phylogenetic structure suggests that there have been 
multiple migrations of EBOV into Guinea from Sierra Leone (although 
viral traffic from Guinea to Sierra Leone may also have occurred on 
occasion). An example of such cross-border virus traffic is a documen- 
ted case that initiated a transmission chain in June 2014 in Conakry’, as 
well as transmission chains in Dalaba (260 km from Conakry), each of 
which is directly linked to different travellers from Sierra Leone. 
Although the numbers are small, the decreasing proportion of these 
sequences (matching the third cluster of Sierra Leone sequences) along 
the road from the Sierra Leonean border towards Conakry via 
Forécariah, and deeper inland in Coyah, might reflect the major trans- 
mission route of these viruses (Fig. 2b). 

The area constituted by the urban setting of Conakry and the neigh- 
bouring prefectures harbours extensive EBOV genetic diversity, char- 
acterized by multiple co-circulating viral lineages. For example, all three 
lineages defined above (GUI-1 to SLE-GUI-3) co-circulated in Conakry 
during September and October 2014 (Figs 1 and 2c). Although early 
concerns associated with the presence of Ebola virus disease in high 
population density settings such as Conakry did not result in increased 
viral transmission, the capital city of Guinea nevertheless represents an 
important regional travel hub and highlights the challenge of control- 
ling Ebola virus disease in and near large urban centres. In addition, 
although it is clear that a number of the EBOV strains circulating in 
Guinea are also present in neighbouring Sierra Leone (lineage SLE- 
GUI-3), reflecting the continued mobility of individuals between these 
localities during the peak of the epidemic and in the face of outbreak 
control measures, Guinea is also characterized by a number of indepen- 
dently evolving viral lineages, such that the epidemics in these countries 
have generated localized genetic diversity. Despite case reports reaching 
very low values in Conakry at several points during the summer of 2014, 
the recurrent transmission of the three distinct lineages in this locality is 
another indication of the challenges of controlling Ebola virus disease in 
large urban centres with highly mobile populations. 
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Figure 1 | Maximum clade credibility (MCC) phylogenetic tree of the 195 
EBOV isolates from West Africa. Tip times are scaled to the date of sampling 
(with a timescale shown on the x axis), and colour-coded according to the 
geographic location of sampling (at the district level for Guinea, and country 
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A total of 207 single nucleotide polymorphisms (SNPs) (51 
non-synonymous, including 24 novel, 88 synonymous and 68 inter- 
genic), have been fixed in individual patients within the sample of 
viruses analysed here. In contrast to the situation early in Sierra 
Leone, the viruses sampled from Guinea harbour numerous non- 
synonymous mutations which define lineages (Fig. 2a). Notably 
in GP, in which mutations could affect the efficacy of vaccines 
or antibody treatments, a C7025T (Pro—Ser) substitution in part 
defines GUI-1, and belongs to the heavily glycosylated mucin-like 
domain. Although O-glycosylation does involve the attachment of 
N-acetylglucosamine (GlcNac) to a serine (and/or threonine) residue, 
the sialylation pattern of this disordered domain appears to vary with 
the cellular environment. Two mutations (A6357G (Asn—Asp), in 
GP1 domain II and G7476A (Gly—Asp) in GP1 carboxy terminus) 
co-occur in a later branch of this cluster, whereas C7256T (His—Tyr), 
again in the mucin-like domain, is observed in another branch. We also 
observed one change in a glycosylation site (A6726G (Thr—Ala)) ina 
sub-cluster of sequences in SLE-GUI-3. Surprisingly, a mutation in the 
highly conserved interferon inhibitory domain of VP35 (C4116T) 
introduces a phenylalanine, characteristic of Sudan EBOV, but never 
previously observed in EBOV-Zaire. Another mutation in VP35, 
G3151A (Arg—Lys), lies in the sequence targeted by AVI-7539, a 
phosphorodiamidate morpholino oligomer (PMO)-based therapeutic 
candidate®. Studies of the phenotypic consequences of such mutations 
on viral components directly interacting with the host immune res- 
ponse could provide key insights into their epidemic potential, and also 
inform the therapeutic options currently considered for deployment”®. 

There has been some debate over the rate at which EBOV has evolved 
during the West African outbreak of EBOV, and what this may mean for 
the adaptive capacity of the virus, including changes in virulence’. Our 
estimates of the rate of nucleotide substitution for the combined Guinea 
and Mali and Sierra Leone data set under both strict and relaxed molecu- 
lar clocks and using a variety of demographic and substitution models fall 
within the range of those obtained previously for EBOV*”’, with mean 
rates of between 0.87 X 10~* to 0.91 X 10~* nucleotide substitutions per 
site per year (range of credible intervals of 0.68 X 10 * to 11X10 * 
substitutions per site per year) (Extended Data Fig. 2). Essentially ident- 
ical rates were observed when studying the Guinean viruses in isolation. 
However, these rates are lower than those observed during the early 
spread of the virus in Sierra Leone’. It is therefore possible that the rate 
estimate provided by ref. 4 represents a random fluctuation due to lim- 
ited genetic variation within sequences from Sierra Leone sampled over a 
relatively short time-period, and/or has been elevated by the presence of 
transient deleterious mutations that have yet to be removed by purifying 
selection, as suggested by those authors*. Indeed, evolutionary rates in 
RNA viruses are known to have a strongly time-dependent quality, such 
that they are expected to be higher in the short-term than the long-term”. 
In addition, it is possible that differences in rate estimates in part reflect 
minor differences in substitution model parameters, the duration of 
intra-host virus evolution, as well as local epidemiological variation. 
More generally, it is difficult to translate relatively small differences in 
estimates of substitution rate, such as those obtained for EBOV in West 
Africa, into predictions on the future evolution of such key phenotypic 
traits as virulence, as the latter are more dependent on the nature of the 
selection pressures acting on the virus as well as the complex relationship 
between virulence and transmissibility. The data presented here indicates 
EBOV is able to generate and fix nucleotide and amino acid variation 
within co-circulating viral lineages on the time-scale of individual out- 
breaks, including the presence of country-specific lineages, and which 
may ultimately produce variants with important fitness differences. 

Continued genomic surveillance is a strong complement to some- 
times difficult local epidemiological investigations. We believe that the 
deployment of additional next-generation sequencing facilities in the 
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Figure 2 | Patterns of mutation accumulation during the 2014 epidemic. 
a, Mutations found in at least two separate sequences, showing one patient per 
row. Grey blocks indicate identity with the Kissidougou Guinean sequence 
(GenBank accession KJ660346). The top row shows the type of mutation (dark 


West African surveillance network, thereby avoiding the logistical and 
regulatory’* hurdles associated with long-distance sample transporta- 
tion, will positively contribute to the control of the current epidemic 
and help limit future outbreaks. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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grey, intergenic; green, synonymous; red, non-synonymous), with the genomic 
location indicated above. Cluster assignment is shown at the left. b, The 
geographic distribution of EBOV variants, coloured by clusters. c, Number of 
Ebola virus disease patients sequenced per ten days, coloured by cluster. 
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METHODS 

Ethics statement. This study has been evaluated and approved by the ethics 
committee of Guinea (ref: 35/CNERS/15), the Ebola research committee and the 
Institutional Review Board at Institut Pasteur. The Office of the Guinean Ethics 
and Scientific Review Committee granted a waiver to provide written consent to 
sequence and make publicly available viral sequences obtained from patient and 
contact samples collected during the Ebola virus disease outbreak in Guinea. 
Sample collection and processing. Samples were collected from suspected Ebola 
cases hospitalized at an Ebola treatment centre in Conakry (Donka hospital) or 
from other regions of Guinea. EBOV detection was tested by quantitative RT-PCR 
using a Taqman assay with 5-FAM and 3-TAMRA probes on a portable Smart- 
Cycler TD. Each sample was run three times on three separate assays. 

Carrier RNA and host ribosomal RNA depletion. Carrier RNA and host ribo- 
somal RNA was depleted from RNA samples as described in ref. 14, using the 
NEBNext rRNA Depletion Kit (New England Biolabs). 

cDNA synthesis, Nextera library construction and Illumina sequencing of 
EBOV samples. RNA from selective depletion was used for cDNA synthesis 
and Illumina library preparation as described previously*"*. Each individual sam- 
ple was indexed with a unique dual barcode and libraries were pooled equally and 
sequenced on a HiSeq2500 (101-base-pair (bp) paired-end reads; Illumina) plat- 
form. 

Demultiplexing of raw Illumina sequencing reads. Illumina Analysis Pipeline 
version 1.8 was used for image analysis, base calling, error estimation and demul- 
tiplexing. 

Mapping of full-length EBOV genomes. Sequencing read pairs were obtained, 
from which low-quality bases and remaining adaptor/barcode sequences were 
removed. Reads were mapped to a 2014 EBOV genome (GenBank accession 
number: KM233070) using the CLC Genomics Assembly Cell v4.2 implemented 
in Galaxy'*""”. All genomes generated here were annotated and manually inspected 
for accuracy, such as the presence of intact open reading frames, using Geneious v8 
(ref. 18). Multiple sequence alignments across all EBOV from the 2014-2015 
outbreak were generated by first aligning amino acid sequences using 
MUSCLE”, and then aligning the nucleotide sequence based on the amino acid 
alignment. 

Screening for recombinant sequences. To screen for potential recombination in 
the 2014-2015 EBOV sequences we used the RDP, GENECONV, MAXCHI, 
CHIMAERA, 3SEQ, BOOTSCAN and SISCAN methods as implemented in the 
RDP4 (ref. 20) software package with default settings. No recombinant sequences 
were identified, nor was there any evidence for phylogenetic incongruence among 
the sequences analysed here. 

Phylogenetic tree inference. Phylogenetic trees on our total data set of 195 
sequences, 18,959 bp alignment length, were estimated using the maximum like- 
lihood (ML) procedure in RAxML v8, employing the GIR-I’ model of nucleotide 
substitution. Fifty instances were run to obtain the best tree, and statistical support 
for each node was calculated using the standard bootstrapping algorithm with 500 
pseudoreplicates (Extended Data Fig. 1). A topologically equivalent ML tree was 
obtained using the GTR+T model available in PhyML”!. 

Analysis of evolutionary rates. We employed the Bayesian Markov Chain Monte 
Carlo (MCMC) method in BEAST v1.8 (ref. 22) to estimate the rate of EBOV 
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evolution (nucleotide substitution) during the 2014-2015 epidemic. The date 
(day) for each individual sample was based on the time of diagnostic testing. 
Importantly, very similar estimates were obtained using a variety of substitution, 
coalescent and molecular clock models, namely (i) the HKY-T' and GTR-T nuc- 
leotide substitution models (with four categories of the gamma distribution of 
among-site rate variation, I’), (ii) constant population size, Bayesian SkyGrid, and 
exponential population tree priors, and (iii) strict and relaxed (uncorrelated 
lognormal) molecular clocks (Extended Data Fig. 2). In all cases the MCMC 
was run until convergence was (easily) achieved (Extended Data Fig. 2). A broadly 
similar substitution rate (0.99 X 10° substitutions per site per year), and relatively 
strong temporal structure (correlation coefficient = 0.87, R= 0.76), was 
obtained using a regression of root-to-tip genetic distance in the ML (PhyML) 
tree against sampling date using the Path-O-Gen program (http://tree.bio.ed. 
ac.uk/software/pathogen/). The posterior distribution of trees obtained from the 
BEAST analysis was also used to obtain the maximum clade credibility (MCC) 
tree for these sequences. For simplicity, we used the results of the HKY+T, 
constant population size, strict molecular clock analysis to create the MCC tree 
as this had the narrowest distribution (Extended Data Fig. 2). Prior to inference 
of the MCC tree, 10% of the runs were removed as burn-in. 

Glycoprotein RNA editing. The RNA editing site of the GP gene consists of 7 U 
residues; co-transcriptional stuttering can result in transcripts with more or less A 
residues. The resulting frameshifts allow for the expression of distinct glycopro- 
teins called sGP (7 A), GP (predominantly 8 A), and ssGP (predominantly 6 A). 
Deep sequencing revealed 8 U at ~1% and 7 U at ~99%, values similar to those 
described in ref. 4. 

Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized. The investigators were not blinded to 
allocation during experiments and outcome assessment. 
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Extended Data Figure 1 | Maximum likelihood phylogenetic tree of EBOV The tree is rooted according to the topology seen in the MCC tree (Fig. 1) under 


from the 2014-2015 outbreak in West Africa. Published sequences from the assumption of a molecular clock, although the observation of three 
Sierra Leone are shown in blue, those from Mali in green, and those from main lineages of EBOV in Guinea is robust to rooting position (including 
Guinea in red. All horizontal branch lengths are scaled to the number of rooting on the oldest sequences from March 2014). 


nucleotide substitutions per site. Bootstrap values are shown for key nodes. 
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Extended Data Figure 2 | Substitution rates and temporal signal. exponential population growth, Bayesian SkyGrid), and molecular clock 
a, Posterior distribution of nucleotide substitution rates (X10 “ substitutions (strict, relaxed lognormal (UCLN)) models. Note the extensive overlap among 
per site per year) in the 195 sequence EBOV data set and using a range of estimates under a range of models. b, Root-to-tip regression of genetic distance 


substitution (HKY+T and GTR+1), demographic (constant population size, against day of sampling for the 195 sequence EBOV data set. 
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Extended Data Table 1 | Guinean EBOV samples sequenced in this study 


Virus Sampling date Virus Sampling date 

Conakry-505 24 July 2014 Conakry-1213 02 October 2014 
Conakry-509 24 July 2014 Conakry-1215 02 October 2014 
Siguiri-517 24 July 2014 Conakry-1249 04 October 2014 
Conakry-507 24 July 2014 Coyah-1274 04 October 2014 
Kouroussa-531 27 July 2014 Coyah-1277 04 October 2014 
Conakry-573 02 August 2014 Coyah-1278 04 October 2014 
Gueckedou-633 12 August 2014 Conakry-1250 04 October 2014 
Macenta-645 14 August 2014 Coyah-1279 04 October 2014 
Conakry-653 15 August 2014 Conakry-1298 06 October 2014 
Conakry-657 16 August 2014 Coyah-1316 07 October 2014 
Conakry-678 19 August 2014 Coyah-1321 07 October 2014 
Conakry-684 20 August 2014 Kerouane-1331 07 October 2014 
Conakry-691 21 August 2014 Coyah-1333 07 October 2014 
Conakry-701 22 August 2014 Coyah-1320 07 October 2014 
Conakry-740 26 August 2014 Coyah-1327 07 October 2014 
Conakry-742 26 August 2014 Coyah-1339 08 October 2014 
Conakry-768 27 August 2014 Conakry-1340 08 October 2014 
Conakry-786 28 August 2014 Conakry-1342 08 October 2014 
Conakry-787 28 August 2014 Coyah-1355 09 October 2014 
Dubreka-789 29 August 2014 Forecariah-1365 09 October 2014 
Coyah-955 11 September 2014 Conakry-1371 10 October 2014 
Conakry-976 14 September 2014 Coyah-1374 10 October 2014 
Forecariah-989 15 September 2014 Coyah-1394 11 October 2014 
Conakry-1027 18 September 2014 Coyah-1436 13 October 2014 
Conakry-1043 19 September 2014 Conakry-1445 14 October 2014 
Conakry-1039 19 September 2014 Conakry-1454 14 October 2014 
Kindia-1047 20 September 2014 Conakry-1480 15 October 2014 
Conakry-1059 21 September 2014 Conakry-1481 15 October 2014 
Coyah-1063 21 September 2014 Conakry-1491 16 October 2014 
Forecariah-1069 21 September 2014 Conakry-1551 18 October 2014 


Conakry-1081 22 September 2014 Forecariah-1567 18 October 2014 
Dalaba-1104 24 September 2014 Forecariah-1568 18 October 2014 
Dalaba-1116 24 September 2014 Conakry-1561 19 October 2014 
Conakry-1105 24 September 2014 Forecariah-1571 20 October 2014 
Conakry-1120 25 September 2014 Nzeerekore-1622 22 October 2014 
Conakry-1121 25 September 2014 Kindia-1648 23 October 2014 
Conakry-1128 25 September 2014 Forecariah-1623 24 October 2014 
Conakry-1129 25 September 2014 Conakry-1651 24 October 2014 
Conakry-1149 27 September 2014 Coyah-1652 24 October 2014 
Conakry-1193 28 September 2014 Coyah-1686 24 October 2014 
Conakry-1205 02 October 2014 Coyah-1689 25 October 2014 
Conakry-1210 02 October 2014 Coyah-1690 25 October 2014 
Dalaba-1211 02 October 2014 
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Cyanate as an energy source for nitrifiers 


Marton Palatinszky’, Craig Herbold', Nico Jehmlich?, Mario Pogodal, Ping Han!, Martin von Bergen?*4, llias Lagkouvardos't, 
Soren M. Karst*, Alexander Galushko'+, Hanna Koch!, David Berry’, Holger Daims! & Michael Wagner’ 


Ammonia- and _ nitrite-oxidizing microorganisms are collectively 
responsible for the aerobic oxidation of ammonia via nitrite to nitrate 
and have essential roles in the global biogeochemical nitrogen cycle. 
The physiology of nitrifiers has been intensively studied, and urea and 
ammonia are the only recognized energy sources that promote the 
aerobic growth of ammonia-oxidizing bacteria and archaea. Here we 
report the aerobic growth of a pure culture of the ammonia-oxidizing 
thaumarchaeote Nitrososphaera gargensis' using cyanate as the sole 
source of energy and reductant; to our knowledge, the first organism 
known to do so. Cyanate, a potentially important source of reduced 
nitrogen in aquatic and terrestrial ecosystems’, is converted to ammo- 
nium and carbon dioxide in Nitrososphaera gargensis by a cyanase 
enzyme that is induced upon addition of this compound. Within 
the cyanase gene family, this cyanase is a member of a distinct clade 
also containing cyanases of nitrite-oxidizing bacteria of the genus 
Nitrospira. We demonstrate by co-culture experiments that these 
nitrite oxidizers supply cyanase-lacking ammonia oxidizers with 
ammonium from cyanate, which is fully nitrified by this microbial 
consortium through reciprocal feeding. By screening a comprehensive 
set of more than 3,000 publically available metagenomes from envir- 
onmental samples, we reveal that cyanase-encoding genes clustering 
with the cyanases of these nitrifiers are widespread in the environment. 
Our results demonstrate an unexpected metabolic versatility of nitri- 
fying microorganisms, and suggest a previously unrecognized import- 
ance of cyanate in cycling of nitrogen compounds in the environment. 

Cyanate is a small molecule containing carbon, nitrogen, and oxy- 
gen atoms. It is formed spontaneously within cells from urea and 
carbamoyl phosphate**, but also occurs in the environment where it 
may be produced from the chemical/physicochemical decomposition 
of urea or cyanide**. Until recently, environmental cyanate concentra- 
tions were difficult to obtain, as the available analytical methods were 
inadequate for sub-micromolar detection. Furthermore, cyanate is not 
chemically stable and decomposes relatively slowly to ammonium and 
carbon dioxide (CO). The decomposition rate is linearly related to the 
concentration of cyanate and thus the compound is reasonably stable 
at low concentrations (Extended Data Fig. 1). A more sensitive chro- 
matographic method for the detection of cyanate in aquatic samples 
was recently developed and revealed nanomolar-range cyanate con- 
centrations in seawater®. These cyanate levels are in the same order of 
magnitude as ammonium concentrations typically found in oligo- 
trophic marine environments’. Consistently, cyanate has been postu- 
lated to serve as a nitrogen source for the growth of certain marine 
cyanobacteria under nitrogen limitation’*. For assimilation of cyanate, 
these phototrophic bacteria convert it to ammonium and CO) with the 
enzyme cyanase (also known as cyanate lyase and cyanate hydratase). 
Cyanases are also found in a variety of other bacteria and archaea, 
where they have been reported to play a role in nitrogen assimilation or 
detoxification as cyanate chemically modifies proteins through carba- 
mylation”'®. However, to our knowledge, no microorganism has been 
described that can grow using cyanate as a source of energy and 
reductant. 


Nitrifying microorganisms are generally considered to be highly 
specialized chemolithoautotrophs that oxidize either ammonia or 
nitrite to generate energy and reductant for growth, and use CO, as 
a carbon source. Over the past few decades, however, this perception 
has been challenged by several studies'’'’. For example, it was 
reported that uncultured thaumarchaeota closely related to the 
ammonia oxidizer Nitrososphaera gargensis thrive in wastewater 
treatment plants using unknown sources of energy and reductant 
other than ammonium or urea™ and that nitrite oxidizers of the genus 
Nitrospira can derive energy for growth by aerobic hydrogen 
oxidation’. Furthermore, the growth of some thaumarchaeotal 
ammonia oxidizers is stimulated by the addition of organic com- 
pounds’*, while others may be obligate mixotrophs’’. However, 
aerobic growth of ammonia-oxidizing microorganisms has thus far 
only been demonstrated in the presence of urea or ammonium. 

Recently, we sequenced the genome of N. gargensis enriched from a 
thermal spring sample’. Unexpectedly, a gene encoding a putative 
cyanase was detected close to the gene of a putative cyanate/nitrite/ 
formate transporter’’. In contrast, all other sequenced genomes of 
archaeal or bacterial ammonia oxidizers, including its closest relative 
Nitrososphaera viennensis”’, do not contain a cyanase-encoding gene. 
As N. gargensis shares most central metabolic pathways with other 
thaumarchaeotes, it is very unlikely that it requires cyanase for detoxi- 
fication of internally produced cyanate. We therefore hypothesized 
that N. gargensis may instead use cyanate as a source of energy and 
reductant for growth. Prior to testing our hypothesis, we obtained a 
pure culture of N. gargensis by repeated serial dilutions over a period of 
16 months (see Supplementary Information). The pure culture of 
N. gargensis grew well in the presence of 2 mM ammonium and growth 
was not inhibited by addition of 0.5 mM cyanate. After a short period 
of growth in the presence of both ammonium and cyanate, the biomass 
of N. gargensis was transferred to a medium in which cyanate was the 
sole source of energy, reductant and nitrogen. In this medium, 
N. gargensis stoichiometrically converted cyanate via ammonium to 
nitrite (Fig. 1a), and cyanate degradation was the rate-limiting step of 
the overall process (Extended Data Fig. 1). A much slower conversion 
of cyanate to ammonium, reflecting chemical decay, was observed 
in control experiments with equal amounts of dead biomass of 
N. gargensis (Fig. 1b). Notably, growth of N. gargensis in the medium 
containing only cyanate as an energy source was demonstrated by total 
protein measurements (Fig. 1c) and by a quantitative polymerase 
chain reaction (qPCR) assay targeting its 16SrRNA gene (Extended 
Data Fig. 2). During growth on 0.5 mM cyanate, N. gargensis showed, 
according to total protein measurements, a mean generation time of 
136.3h (£11.4 (s.d.)), which is slightly higher than the mean genera- 
tion time observed during growth on 0.5 mM ammonium, which was 
determined to be 113.4h (+6.1). This difference might reflect the 
toxicity of cyanate, despite the presence of a cyanase, or the additional 
energy demand for the synthesis of cyanase during growth. Proteomic 
analyses revealed that on first exposure of N. gargensis to 0.5 mM 
cyanate for 48h, cyanase was the most strongly induced protein 
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Figure 1 | N. gargensis grows on cyanate. a, Concentration changes of 
cyanate, ammonium, and nitrite during the growth of N. gargensis in a 
mineral medium containing 0.5 mM cyanate as the sole source of energy and 
reductant. Arrows indicate additions of 0.5 mM cyanate. b, Control 
experiment with an identical amount of dead biomass of N. gargensis. Nitrite 
was added at different time points, as indicated by the arrows, to mimic the 
conditions in the experiment with living biomass. All experiments shown in 
panels a and b were performed in four replicates and the chemical 
measurements were done in three technical replicates (averaged). Data points 
are mean values of four biological replicates, error bars show s.d. c, Total 
protein concentration of N. gargensis during growth on cyanate. For 
comparison, the respective protein concentration of N. gargensis after growth 
in medium with 0.5 mM ammonium is presented. Protein concentration 
increased 4.99-fold during growth on ammonium and 3.81-fold during 
growth on cyanate over 263 h. Significance was calculated by paired t-test, 
*P < 0.05 compared to 0h. Columns show mean values of four biological 
replicates; error bars show s.d. Biomass increase was independently 
confirmed by qPCR (Extended Data Fig. 2). 
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(Extended Data Fig. 3; 32-fold change; mean from triplicates), 
confirming its key role in growth on cyanate. However, the putative 
cyanate/nitrite/formate transporter encoded in the same genomic 
region was not detected, despite the fact that a protocol optimized 
for extraction of membrane proteins was applied (see ‘Proteomic ana- 
lysis’ section of Methods). This is probably due to the fact that cyanate 
diffuses through biological membranes at millimolar concentrations”. 
Interestingly, cyanate conversion was also observed in N. gargensis 
cultures without a previous period of growth in the presence of ammo- 
nium and cyanate. Furthermore, conversion of cyanate to nitrite by 
N. gargensis could also be detected at a tenfold lower concentration 
of the compound (0.05 mM) (Extended Data Fig. 4). 

While N. gargensis is the only ammonia-oxidizing microorganism 
with a sequenced genome in which a cyanase-encoding gene is present 
(which was probably acquired from a Nitrospira strain via lateral 
gene transfer'®), all nitrite oxidizers for which a genome sequence is 
available contain a gene annotated as cyanase (Extended Data Table 1). 
To test whether these genes are functional, we performed experiments 
with a pure culture of the nitrite oxidizer Nitrospira moscoviensis, 
which possesses a cyanase closely related to that of N. gargensis. 
After 96h of incubation in the presence of around 1 mM cyanate, N. 
moscoviensis degraded significantly more cyanate, causing ammonium 
release from cells, than did a negative control that included an identical 
amount of dead biomass of this strain, demonstrating that N. moscov- 
iensis is capable of cyanate degradation (Extended Data Fig. 5). In a 
separate experiment, addition of 1 mM cyanate only decreased nitrite 
oxidation rates slightly in N. moscoviensis, while higher concentrations 
showed a stronger effect (Extended Data Fig. 6). The presence of a 
cyanase in the genomes of all nitrite oxidizers might reflect that these 
nitrifiers make more cyanate as a side product of their metabolism 
than ammonia-oxidizing microorganisms. Cyanate is produced from 
both carbamoyl phosphate metabolism and urea formation, and while 
the enzymatic repertoire involved in these processes is highly similar 
between ammonia oxidizers and nitrite oxidizers, many members of 
the latter group (but also some thaumarchaeotes) do not contain 
enzymes for degradation of internally produced urea (Extended 
Data Table 1). In addition, it is possible that nitrite-oxidizers continu- 
ously import cyanate from the environment, as some of their trans- 
porters for uptake of environmental nitrite are also capable of 
transporting cyanate*’. In both scenarios the presence of a cyanase 
enzyme is beneficial for nitrite oxidizers because it allows them to 
detoxify cyanate, and the formed ammonium is not only available 
for assimilation but after secretion (Extended Data Fig. 5) might also 
serve as a source of energy and reductant for ammonia oxidizers, 
which typically grow in close vicinity to nitrite oxidizers**”’. The sub- 
sequent activity of the ammonia oxidizers leads to the formation of 
nitrite, which can then be consumed by the nitrite oxidizers (Fig. 2a). 
This reciprocal feeding pattern would enable nitrite oxidizers as well as 
ammonia oxidizers without a cyanase to collectively convert cyanate 
for energy and reductant generation. We tested this hypothesis by 
establishing a co-culture of the ammonia-oxidizing bacterium 
Nitrosomonas nitrosa Nm90 (ref. 24), which has no cyanase activity 
but is not inhibited in its activity by 1 mM cyanate (Extended Data 
Fig. 7), with the cyanase-encoding nitrite oxidizer N. moscoviensis. 
Consistent with the reciprocal feeding hypothesis, the co-culture stoi- 
chiometrically converted cyanate to nitrate (Fig. 2c and Extended Data 
Fig. 8), and fluorescence in situ hybridization with specific 16S rRNA- 
targeted probes revealed that dense clusters containing both nitrifiers 
had formed (Fig. 2b). Conversion rates of cyanate to nitrate were 
accelerated by addition of ammonium at the start of the experiment, 
allowing consortium members to gain energy and reductant before 
interspecies cyanate degradation was fully established (Fig. 2d). In 
contrast, no nitrate formation was observed in abiotic control experi- 
ments using the same medium (Extended Data Fig. 9). 

The cyanases found in N. gargensis and members of the genus 
Nitrospira form a deep-branching clade to the exclusion of other 
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Figure 2 | Reciprocal feeding between ammonia and nitrite oxidizers 
during cyanate conversion. a, Schematic illustration of the interaction 
between cyanate-degrading nitrite-oxidizing bacteria (NOB) and cyanase- 
negative ammonia-oxidizing microorganisms (AOM). Solid arrows represent 
conversions of compounds; dashed arrows represent the uptake or release of 
compounds. Green arrows represent conversions used for energy (E) and 
reductant generation. Red arrow shows the conversion of cyanate by the 
cyanase. b, Co-aggregation of Nitrosomonas nitrosa (red) and N. moscoviensis 
(green) in the co-culture experiment shown in panel d after 168h, as revealed 


cultured organisms’®. We searched a collection of 3,000 metagenomic 
data sets available from the Integrated Microbial Genomics (IMG) 
system” and identified 225 additional metagenomic cyanase genes 
(fragments) that are related to the cyanases of these known nitrifiers 
(Fig. 3). These findings show that this novel cyanase gene family is 
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Figure 3 | Nitrososphaera gargensis and Nitrospira cyanases form a distinct 
family containing sequences from various metagenomes. a, Bayesian 80% 
consensus amino acid tree with alignment uncertainty calculated with 
BAli-phy”. For clarity, posterior support is shown only for the branch 
separating this cyanase family from other cyanases. Cyanases from nitrite- 
oxidizing bacteria are indicated by blue branches. PP, posterior probability. 
b, Bayesian 80% consensus amino acid tree of the Nitrososphaera/Nitrospira 


by fluorescence in situ hybridization. c, d, Concentration changes of cyanate, 
ammonium, nitrite, and nitrate during the growth of the cyanase-negative 
ammonia-oxidizing bacterium N. nitrosa and the cyanase-positive nitrite 
oxidizer N. moscoviensis in a mineral medium containing 1 mM cyanate (c) or 
1mM cyanate and 1mM ammonium (d). Error bars show s.d. of three 
technical replicates. For each experiment, three biological replicates were 
performed (one replicate is displayed in panels ¢ and d, all replicates including 
mass balances are shown in Extended Data Fig. 8). Note that N. nitrosa did not 
grow equally well in all replicates. 


widespread in the environment. Most of these cyanases were located 
on very small contigs, preventing an independent phylogenetic clas- 
sification of the organisms carrying these genes. The metagenomic 
cyanase fragments most closely related to N. gargensis (47-55% 
amino acid similarity) were retrieved from three different peat and 
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cyanase family that contains separate well-supported Nitrososphaera-related 
(red) and Nitrospira-related (blue) clades. Metagenomic cyanase sequences 
that showed more than 99% amino acid similarity were clustered using 
Usearch”. Beside each metagenomic sequence, the total number of clustered 
sequences (S) and the number of metagenomic data sets (M) from which 
they were retrieved is displayed. Scale indicates the number of substitutions 
per site. 
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permafrost soils in Alaska, while the sequences most closely affiliated 
with Nitrospira cyanases (67-80% amino acid similarity) were mostly 
found in temperate forest and agricultural soil from lower latitudes as 
well as in lakes, freshwater sediment and groundwater, matching the 
known distribution of Nitrospira in a broad range of different ecosys- 
tems” (Fig. 3b). 

Our findings show that an archaeal ammonia oxidizer can grow on 
cyanate, utilizing it as the sole source of energy, reductant, and nitro- 
gen. Furthermore, nitrite oxidizers of the genus Nitrospira (and prob- 
ably all nitrite oxidizers) can convert cyanate to ammonium and are 
capable of fully nitrifying it through a newly discovered type of recip- 
rocal feeding with cyanase-negative ammonia oxidizers. This meta- 
bolic capability potentially provides them with a selective advantage in 
environments where cyanate is present, in particular if ammonium 
concentrations are low, and thus may be an important facet of the 
ecology of nitrifiers. Cyanate forms spontaneously by isomerization of 
urea in aqueous solution. The high concentration of urea in many 
ecosystems (ranging from polar seawater and sea ice” to the huge 
areas of urea-fertilized soils in global agriculture) combined with 
the wide distribution of nitrifier-related cyanase genes underscores 
the potential environmental ubiquity of this unique physiology. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Purification and standard cultivation of Nitrososphaera gargensis. A pure 
culture of the ammonia-oxidizing archaeon Nitrososphaera gargensis' was obtained 
through a series of antibiotics treatments (50 mg]! kanamycin; 50 mg] penicillin- 
G; 100 mg rt streptomycin; 100 mg]? carbenicillin; 50 mg]! ampicillin; 20 mg]! 
erythromycin; 20 mg!” ' doxycyclin) and repeated serial dilutions in the ammonia- 
oxidizer medium described below. Purity of the culture was confirmed by phase 
contrast microscopy and by using a specific catalysed reported deposition—fluor- 
escence in situ hybridization (CARD-FISH) assay’, as well as by PCR targeting the 
16SrRNA gene, using various universal eubacterial and archaeal primer combina- 
tions (27f 5’-AGAGTTTGATYMTGGCTCAG-3’; Arch21f 5'-TTCCGGTTGAT 
CCYGCCGGA-3’; 907f 5’-AAACTCAAAKGAATTGACGG-3’; 909r 5'-CCGTC 
WATTCMTTTGAGT-3’; 1390r 5’-GACGGGCGGTGTGTACAA-3’; 1492r 5’- 
GGYTACCTTGTTACGACTT-3’) on DNA extracted by three different DNA isola- 
tion methods (bead-beating with phenol:chloroform extraction; MoBio UltraClean 
Soil DNA kit; FastDNA SPIN Kit for Soil). Any PCR product obtained was cloned 
and sequenced, retrieving only N. gargensis 16S rRNA gene sequences. In addition, 
no growth was observed if the N. gargensis culture was inoculated into various rich 
media such as lysogeny broth, nutrient agar and tryptic soy agar. Subsequently, 
N. gargensis was grown at 46 °C in a modified ammonia-oxidizing archaea (AOA) 
medium” containing (per litre): 50 mg KH,PO.; 75 mg KCl; 50 mg MgSO, X 7H20; 
584 mg NaCl; 4 g CaCO; (mostly undissolved, acting as a solid buffering system and 
growth surface); 1 mlof specific trace element solution (AOA-TES); and 1 ml of 
selenium-wolfram solution (SWS)*'. The composition of TES and SWS is described 
below. Both solutions were added to the autoclaved medium by sterile filtration using 
0.2 um pore-size cellulose acetate filters (Thermo Scientific). The pH of the medium 
was around 8.4 after autoclaving and was kept around 8.2 during growth of 
N. gargensis by the CaCO buffering system. AOA-TES contained (per litre): 
34.4mg MnSO, X 1H,0; 50 mg H;BO3; 70 mg ZnCl; 72.6 mg NayMoO, X 2H,O; 
20mg CuCl, X 2H2O; 24mg NiCl, X 6H20; 80mgCoCl, X 6H20; 1g FeSO, x 
7H,0. All salts except the FeSO, X 7H,O were dissolved in 997.5 ml Milli-Q water 
and 2.5 ml of 37% (smoking) HCl was added before dissolving the FeSO, X 7H2O 
salt. SWS contained (per litre): 0.5gNaOH; 3mgNa,SeO;X5H,0; 4mg 
Na,WO, X 2H,0. After completing the medium, ammonium chloride (from an 
autoclaved 0.2M stock solution) or potassium cyanate (filter sterilized, Sigma 
Aldrich) was added to the medium based on the experimental setups. All cultures 
were grown in the dark in screw-cap Schott bottles (Schott AG) at 46°C 
without shaking. 

Growth of N. gargensis on 0.5 mM cyanate. Cultures were induced with 0.5 mM 
(final concentration) potassium cyanate (KOCN) and 0.5mMNH,ClI 2 days 
before the experiment. After 48h, cyanase-induced cultures were harvested by 
centrifugation (10,000g for 30 min at room temperature), washed in AOA med- 
ium, centrifuged again, and inoculated into 20 ml fresh AOA medium in 50 ml 
CELLSTAR plastic suspension culture flasks (Greiner Bio-One), containing no 
ammonium but 0.5 mM KOCN final concentration. Biomass protein concentra- 
tions used for inoculation were 14.51 + 2.3p1gml~’. Cultures were incubated 
without shaking at 46°C in the dark, for 11 days (264h). All incubations were 
done in four replicates. Samples for chemical, protein and qPCR analysis were 
taken every 12h for the first 4 days, with daily sampling thereafter. After the 
experiment, the cells were harvested and washed as described above and then 
transferred into 20 ml of fresh AOA medium without ammonium and autoclaved. 
After cooling to room temperature, 20 jl TES, 20 pl SWS, and 0.5 mM of filter 
sterilized KOCN was added, and the dead biomass was incubated for 46 °C in the 
dark for 264h. To mimic the production of nitrite in these control experiments 
with dead biomass, NaNO) was added at each sampling time point, according to 
the respective levels of nitrite in the biotic experiments at the next time point, 
resulting in a nitrite concentration, which is always at least as high as in the biotic 
parallels. In both experiments (either living or dead biomass) the pH stayed 
constant around 8.2 + 0.3 during the incubation time. 

Growth of N. gargensis on 0.05mM cyanate. Cultures were induced with 
0.05 mM (final concentration) potassium cyanate (KOCN) and 0.5 mM NH,Cl 
2 days before the experiment. After 48 h, cyanase-induced cultures were harvested 
by centrifugation (10,000g for 30 min, room temperature), washed in AOA med- 
ium, centrifuged again, and inoculated into 20 ml fresh AOA medium in 50 ml 
CELLSTAR plastic suspension culture flasks (Greiner Bio-One), containing no 
ammonium but 0.05mM KOCN final concentration. Cultures were incubated 
without shaking at 46 °C in the dark for 264h. In parallel, abiotic controls were 
started with similar parameters, without biomass. All incubations were done in 4 
replicates. Samples for chemical analysis were taken at 7 time points during the 
11 days. In both experiments (either biotic or abiotic) the pH stayed constant 
around 8.2 + 0.3 during the incubation time. 

Cultivation of N. moscoviensis. The nitrite-oxidizing bacterium Nitrospira mos- 
coviensis was pre-grown in mineral nitrite-oxidizing bacteria (NOB) medium” 
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containing (per litre): 1,000 ml distilled water; 10mgCaCO3; 500 mg NaCl; 
50 mg MgSO, X 7H,O; 150mg KH,PO,, as well as 1 ml filter-sterilized NOB- 
specific trace elements solution (NOB-TES) added after autoclaving. The pH 
was initially adjusted to 8.6 which changed during autoclaving to 7.6. NOB-TES 
contained (per litre): 344mg MnSO,X 1H,O; 50mgH;BO3; 70mg ZnCh, 
72.6 mg NagMoOy, X 2H2O; 20 mg CuCl; X 2H20; 24mg NiCl, X 6H2O, 80mg 
CoCl, X 6H,O; 1 g FeSO, X 7H20. All salts, except FeSO, X 7H2O were dissolved 
in 997.6 ml distilled water and 2.5 ml of 37% (smoking) HCl was added before 
dissolving the FeSO, X 7H,O salt. After autoclaving, 1 mM (final concentration) 
of filter-sterilized NaNO, (if not stated otherwise) was added to the medium. All 
cultures were grown in the dark without shaking at 37 °C. If all nitrite was con- 
sumed, it was re-added to a final concentration of 1 mM. 

Cyanate degradation by N. moscoviensis. Nitrite-oxidizing cultures of N. mos- 
coviensis were supplied with 0.5 mM (final concentration) KOCN and incubated for 
48 h at 37 °C to induce the expression of cyanase. Biomass was harvested (8,500 rpm 
for 15 min at room temperature) and washed twice with fresh NOB medium without 
nitrite. Cells were then transferred into 50 ml NOB medium, which either contained 
1mM NaNO, or 1mM KOCN. Biomass concentrations were inferred from total 
protein concentrations, which were 27.6 + 3.9 1gml | as measured by the Pierce 
BCA Protein Assay Kit (Thermo Scientific). Abiotic experiments were performed by 
adding 1 mM KOCN to the NOB medium in the absence of nitrite. Dead biomass 
controls were performed by treating similar amounts of N. moscoviensis biomass 
fixed with paraformaldehyde (4%) as described above. The dead biomass was incu- 
bated in nitrite-free NOB medium containing 1 mM KOCN. All incubations were 
amended by filter-sterilized 1.5 mM NaHCO; (final concentration). All incubations 
were performed in 250 ml Schott bottles closed by rubber stoppers without shaking at 
37 °C in the dark for 96h. All experiments were performed in triplicate. 

In order to evaluate the effect of increasing cyanate concentrations on nitrite 
oxidation by N. moscoviensis, biomass was harvested (9,300g for 15 min at room 
temperature) and washed twice with fresh NOB medium without nitrite. Cells 
were then transferred into 100ml NOB medium. Incubations were performed 
with 1mM NaNO, and 0 mM, 1 mM, 2 mM, 3mM, 4mM, or 5mM of KOCN. 
As an abiotic control, medium containing 5mM KOCN and 1 mM NaNO, was 
incubated without addition of biomass. All incubations were performed in 250 ml 
Schott bottles closed by rubber stoppers without shaking at 37 °C in the dark for 
60h. All experiments were performed in duplicate. 

Response of Nitrosomonas nitrosa Nm90 to cyanate. The ammonia-oxidizing 
bacterium N. nitrosa Nm90 (strain collection of the University of Hamburg, 
Germany) was grown in AOA medium amended with 10 mM NH,Cl at 37 °C. 
Biomass was harvested (8,500 rpm for 15 min at room temperature) and washed 
twice with fresh AOA medium without ammonium. Cells were inoculated 
into 25ml batches of AOA medium containing either 1mMKOCN alone, 
1mMNH,CI and 1mM KOCN, or 10mMNH,Cl and 1mM KOCN. Cultures 
were incubated in 50 ml CELLSTAR plastic suspension culture flasks (Greiner 
Bio-One) at 37 °C in the dark and shaken at 150 rpm. 

Co-culture experiments with N. nitrosa Nm90 and N. moscoviensis. Nitrite- 
oxidizing cultures of N. moscoviensis were supplied with 0.5 mM (final concentra- 
tion) KOCN and incubated for 48 h to induce the expression of cyanase. Biomass was 
harvested (8,500 rpm for 15 min at room temperature) and washed twice with fresh 
AOA medium without nitrite. N. nitrosa Nm90 was grown in AOA medium sup- 
plied with 10 mM NH,Cl. Biomass was harvested (8,500 rpm for 15 min at room 
temperature) and washed twice with fresh AOA medium without ammonium. 
Biomass concentrations were measured separately for N. moscoviensis and N. nitrosa 
Nm90, inferred from total protein concentrations which were 446.5 1g ml land 164 
tig ml’ respectively in 50 ml final volumes for each culture-stock, measured by the 
Pierce BCA Protein Assay kit (Thermo Scientific). All biomass were combined and 
diluted up to 1 litre serving as a master-mix, which was aliquoted to 100 ml batches 
for the experimental setups resulting in a starting protein concentration 20 times 
less than in the separate stocks measured. Subsequently, either 1mM KOCN, 
1mM KOCN and 1 mM NH,Cl, or only 1 mM NH,Cl was added to the experiments 
(final concentrations). In addition, abiotic experiments were performed by adding 
either 1 mM KOCN or 1 mM KOCN and 1 mM NH, Cl to 100 ml AOA medium. All 
incubations were amended by filter-sterilized 1.5 mM NaHCO; (final concentra- 
tion). All experiments were done in 250 ml Schott bottles closed by rubber stoppers, 
incubated without shaking at 37°C in the dark for 168h. All experiments were 
performed in triplicate. 

Chemical analysis. Nitrite levels were measured by photometry with the sulfanila- 
mide N-(1-naphthyl)ethylenediamine dihydrochloride (NED) reagent method”. 
Ammonium levels were measured photometrically as described previously”. 
Cyanate was measured fluorometrically after derivatization with 2-aminobenzoic 
acid to quinazoline-2,4-dione”, with the modification using fluorescence readout 
(excitation, 312 nm; emission, 370 nm). All photometric and fluorometric reads were 
performed with an Infinite 200 Pro spectrophotometer (Tecan Group AG). 
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qPCR quantification of N. gargensis. A qPCR assay was developed using the 
newly designed N. gargensis 16S rRNA gene-specific primers NG1052 5'-TAGTT 
GCTACCTCTGTTC-3’ and NG1436R 5'-ACCTTGTTACGACTTCTC-3’. The 
qPCR reactions were run with three technical replicates in a Bio-Rad C1000- 
CFX96 Real-Time PCR system, using the Bio-Rad iQ SYBR Green Supermix kit 
(Bio-Rad). 

Fluorescence in situ hybridization. Prior to FISH, calcium carbonate-containing 
formaldehyde-fixed samples were treated with 0.1 MHCI for 3 min. After the 
calcium carbonate dissolved, the cells were centrifuged (3 min, 10,000g) and the 
supernatant discarded. The pellet was resuspended in 50 pl EtOH and PBS (50:50) 
and the cell suspension was spotted on slides. The FISH procedure was performed 
according to the standard protocol with 16S rRNA-targeted probes Ntspa712 
(specific for the phylum Nitrospira’®) and Nso1225 (specific for B-proteobacterial 
ammonia-oxidizing bacteria*®). Images were acquired with a Leica SP8 confocal 
laser scanning microscope (Leica). 

Total protein quantification. Protein concentrations were measured using the 
Pierce BCA protein assay kit (Thermo Scientific). 

Replication of physiological experiments. The number of replications are 
detailed in the subsections for each specific experiment, and were mostly deter- 
mined by the amount of biomass available for the different nitrifier cultures. In all 
experiments, a minimum of three biological replications were used, with the 
exception of one auxiliary experiment: decelerating effect of increasing cyanate 
concentrations on nitrite oxidation by N. moscoviensis (Extended Data Fig. 6). No 
statistical methods were used to predetermine sample size. 

Proteomic analysis. Concentrated N. gargensis biomass was inoculated in 140 ml 
modified AOA medium (amended with 1 mM ammonium final concentration, no 
cyanate) in three replicates. After a pre-incubation for 24h, 40-ml samples were taken 
for proteomic analysis (time point 1) and the remaining cultures were amended with 
0.5mM KOCN and 0.1 mM NH,Cl (final concentrations) and further incubated. 
Cultures were regularly fed afterwards with 0.5 mM KOCN (final concentration), 
keeping the concentration between 0.1 mM and 0.6 mM based on residual KOCN 
levels calculated from the produced nitrite levels, measured every 12h. Forty-eight 
hours after switching to cyanate feeding, 40-ml samples were taken again for pro- 
teomic analyses. Cells in the samples from the two different time points were har- 
vested by centrifugation (9,000g, 30 min, 4 °C) and stored at —80 °C. 

The harvested cell pellets were dissolved in 500 ul urea:thiourea buffer (8 M 
urea; 2M thiourea) and sonicated, using a UP50H homogenizer (Hielscher 
Ultrasound Technology), twice on ice for 1 min (amplitude 0.7; power 70%). 
The samples were then ultracentrifuged (100,000g, 1h, 4 °C), and the supernatant 
was transferred into a fresh reaction tube. Pellets were dissolved in 200 jl prepara- 
tion buffer (100 mM tris-HCl; pH 7.5; 300 mM NaCl; 1% digitonin) and incubated 
overnight at 16 °C with 1,200 rpm shaking. After centrifugation (12,000g, 10 min, 
4 °C), the supernatant was combined with the supernatant of the previous pre- 
paration step. This combined lysate was precipitated with acetone (5X volume, 
ice-cold) by incubation for 1h at —20°C, centrifuged (12,000g, 15 min), and the 
protein pellet was air-dried. Protein concentrations of all extracts were determined 
photometrically using a Bradford assay (Bio-Rad Laboratories). SDS-PAGE pre- 
paration, reduction, alkylation, and proteolytic digestion by trypsin with sub- 
sequent C18-purification were performed as described previously’. Mass 
spectrometry was performed by a Orbitrap Fusion mass spectrometer (Thermo 
Fisher Scientific) coupled to a TriVersa NanoMate (Advion, Ltd). Five microlitres 
of the peptide lysates were separated with a Dionex Ultimate 3000 nano-LC system 
(Dionex/Thermo Fisher Scientific). 

Mass spectrometry (MS) raw files were processed using Proteome Discoverer 
(version 1.4, Thermo Scientific). MS spectra were searched against a N. gargensis 
database (Uniprot/Swiss-Prot, containing 3,786 unreviewed sequence entries) and a 
common Repository of Adventitious Proteins (CRAP) database using the Sequest HT 
algorithm. Enzyme specificity was selected as trypsin with up to two missed cleavages 
allowed using 10 ppm peptide ion tolerance and 0.1 Da MS/MS tolerances. Oxidation 
(methionine) and carbamylation (lysine and arginine) were selected as variable mod- 
ifications, and carbamidomethylation (cysteine) as a static modification. Only pep- 
tides with a false discovery rate (FDR) greater than 1% calculated by Percolator™ and 
a peptide rank equal to 1 were considered as identified. 

Modelling biotic and abiotic cyanate degradation kinetics. For cyanate utiliza- 
tion experiments with N. gargensis, the chemical reaction kinetics for the species 
cyanate, ammonia, and nitrite were modelled as two consecutive first-order reactions. 
Reaction rates were then estimated using ordinary least squares optimization for a 
system of nonlinear equations”, as implemented by the nlsystemfit algorithm in the 
package systemfit”® in R*. For calculation of the abiotic degradation of cyanate and 
isocyanic acid (formed by cyanate in aqueous solution) as a function of temperature 
and pH, established reactions and values from the literature were used. Degradation 
was modelled as three first-order reactions: (1) hydronium-ion-catalysed hydrolysis 
of isocyanic acid; (2) direct hydrolysis of isocyanic acid; and (3) direct hydrolysis of 


cyanate, as described previously”. Published values were used for rate constants and 
their temperature dependence”. A value of 3.7 was used for the acid dissociation 
constant for isocyanic acid (reported to range from 3.29 to 3.92), which has no 
detectable temperature dependence in the range of 0°C to 80°C*. 

Phylogenetics of cyanase genes in published metagenomes. Amino acid 
sequences for all members of the newly discovered ‘Cyanase Family’ (2,425 ent- 
ries) were downloaded from UniProt“ and all predicted amino acid sequences 
annotated as cyanase, cyanate hydratase or cyanate lyase were downloaded from 
the Joint Genome Institute IMG Expert Review (IMG/ER) (3,028 sequences) and 
IMG with Microbiome Samples Expert Review (IMG/MER) (5,476 sequences) 
databases” on 8 August 2014. Cyanase sequences from IMG were filtered accord- 
ing to inferred distance (<1.25 replacements per position) and bit scores (>56) 
from UniProt references using alignment/distance calculation in Mafft* and 
blastp*® (word_size 2, BLOSUM45), respectively. In addition, the predicted amino 
acid sequence of the N. gargensis cyanase was used as a query in a tblastn search 
against publicly available metagenomes of the IMG/M database. Hits were filtered 
by E-value (E-values <107 "°), at least 50% length coverage of the query sequence, 
and assignment to the cyanase superfamily (E-value <10° '°) of the Conserved 
Domain Database Database (CDD)”’. All putative cyanase sequences were filtered 
for length (100 residues) and clustered at 99% identity using USEARCH”. The 
resulting 3,340 cyanase sequences were aligned in Mafft to produce a distance 
matrix and clustered into 100 sequence clusters using the hclust(method = “com- 
plete”) and cutree(k = 100) commands in R*!. Clusters were examined manually 
and three singleton sequences that aligned poorly were discarded. Cyanase from N. 
gargensis and N. moscoviensis were added into the data set. Alignment and phylo- 
geny for the set of 99 representative cyanase genes was calculated using BAli-Phy*® 
with an initial alignment randomization and the number of iterations in each run 
set to 1,100 with a burnin of 600. Posterior tree pools from three independent runs 
were combined to assess bipartition support. The 225 environmental cyanase 
sequences identified in a Nitrososphaera/Nitrospira clade were clustered into 61 
representative sequences using USEARCH at 99% minimum identity. Alignment 
and phylogenetic reconstruction for these representative sequences and ten 
broadly sampled outgroup cyanases was carried out in BAli-Phy (randomize 
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Extended Data Figure 1 | Biotic and abiotic cyanate degradation kinetics. 
a, Degradation of 500 1M cyanate and utilization of ammonium by N. gargensis 
modelled as two consecutive first order reactions (cyanate-ammonium- 
nitrite). Measured data are shown as dots and error bars (mean + s.e.m.) and 
model predictions with estimated rate parameters are shown as solid lines. 
Estimated rate constants were k-yanate—ammonium = 4-872 X 10 *min~! and 
kammonium—nitrite = 1-064 X 10 * min™'. The abiotic hydrolysis of 500 uM 
cyanate in this medium was measured to be much slower than enzymatic 
degradation (k yanate—hydrolysis = 8-71 X 10-° min™’). b, The abiotic 
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degradation of low (100 nM; left) and high (500 uM; right) concentrations 
of isocyanic acid/cyanate across a range of temperatures and pH. 
Degradation was modelled using a well-established model of three first- 
order reactions: (1) hydronium-ion-catalysed hydrolysis of isocyanic acid 
(ky = €7” X @ 7701297). (2) direct hydrolysis of isocyanic acid 

(ky = 77° X @ 71646-69/T). and (3) direct hydrolysis of cyanate” 

(ks = e??3 x @ 85/1) The log-transformed degradation rates are shown 
(asmin '). The conditions that were used to test cyanate degradation by 
N. gargensis are marked with a cross. 
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Extended Data Figure 2 | Nitrososphaera gargensis grows on cyanate. 16S | ammonium are displayed. The gene copy numbers increased 6.49-fold during 


rRNA gene copy numbers of N. gargensis as determined by qPCR at three growth on ammonium and 4.98-fold during growth on cyanate over 263 h. 
different time points during the experiment shown in Fig. 1. For comparison, | Columns show means, error bars show s.d. of four biological replicates. 
the respective gene copy numbers after growth in medium with 0.5mM Significance was calculated by a paired t-test. 


©2015 Macmillan Publishers Limited. All rights reserved 


Cyanate lyase 

L—aspartate dehydrogenase 

Putative nitrogen regulatory protein P-t 

Acetolactate synthase, small subunit 

Putative methyimalony!—CoA epimerase 

Acetyl-CoA/ propionyl—CoA carboxylase, carboxyltransterase subunit 
Uncharacterized protein (KOIDIH) 

Replication factor C small subunit 

Asparagine—tRNA ligase 
2,3—-bisphosphoglycerate—independent phosphoglycerate mutase 
Branched—chain amino-acid aminotransterase 

Alkyl hydroperoxide reductase 

Adenylosuccinate lyase 

Pyruvate, phosphate dikinase 

Phenylalanine—tRNA ligase beta chain 

Threonine synthase 

Putative NADPH-dependent F420 reductase 
Threonine—tRNA ligase 

3-dehydroquinate synthase 

Argininosuccinate lyase 

Succinate—CoA ligase (ADP-forming) alpha subunit 
Putative CRISPR-associated protein, DevR (Cas7) family 
Putative glucosamine—1—phosphate N-acetyltransterase 
TATA-box-binding protein 

Uncharacterized protein (KOI8X0) 

DNA ligase 

Luciterase-like monooxygenase tamily protein 
Uncharacterized protein (KOIE2) 

Putative glyceraldehyde-3-phosphate dehydrogenase, phosphorylating 
Putative 2—alkenal reductase 

Putative Rieske (2Fe—2S] domain protein 

Putative cyclase 

1-pyrroline-5—carboxylate dehydrogenase 
Triosephosphate isomerase 

Putative nitrogen regulatory protein P-Il 

Putative nitrogen regulatory protein P-Il 
Uncharacterized protein (K0ID72) 

Hydroxyacyl—CoA dehydrogenase 

Putative CBS domain protein 

Uncharacterized protein (KOILY8) 

Uncharacterized protein (KOIJ59) 

Putative thioredoxin 

MethyImalony!|—CoA mutase, small subunit 

Keratin, type | cytoskeletal 10 

Uncharacterized protein (KOINGS) 

Putative FeS assembly ATPase SufC 

Putative transcriptional regulator, AsnC family 
Adenylate kinase 

Uncharacterized protein (K0IG19) 

Putative citrate/ citry|-CoA lyase 

ABC efflux transporter, ATP-binding protein 

Putative DNA-directed RNA polymerase subunit M 
Uncharacterized protein (KOIEZ5) 

Probable inorganic polyphosphate/ATP—-NAD kinase 
Enolase 

Putative ammonia monooxygenase subunit B (Fragment) 
Thioredoxin 

Uncharacterized protein (KOIC19) 

TPR repeat protein 

Zn-dependent hydrolase ot the beta-lactamase fold protein 
Ammonia monooxygenase/methane monooxygenase, subunit C 
Uncharacterized protein (KOI7H7) 

Uncharacterized protein (KOILY3) 

Putative polyketide cyclase/dehydrase 

Putative NAD-dependent alcohol dehydrogenase 
Putative universal stress family protein 

Uridylate kinase 

Putative agmatinase 

TRAM domain-containing protein 

2-hydroxyacid dehydrogenase 


LETTER 


® significantly increased 
® significantly decreased 


0.0625 0.125 


Extended Data Figure 3 | Cyanase increase upon exposure of N. gargensis 
to cyanate. Fold-increase and -decrease of the 35 most affected proteins after 
48 h exposure of N. gargensis to 0.5 mM cyanate (in comparison to t = 0 of 

N. gargensis biomass that had not been exposed to cyanate). Experiments were 
performed in three biological replicates. Proteins with a significant difference 
in expression are colour coded. Significance of difference was calculated by 

a one-sample t-test on log-fold induction, with the Benjamini-Hochberg false 
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discovery rate set to 0.05 (P value cutoff 0.00878). For the proteomic analyses, 
10 Lg protein and 500 ng peptide lysate per sample was used. Protein 
abundances within a sample were normalized by dividing the peak area for a 
given protein by the median peak area for all detected proteins. Note that 
during growth on cyanate, N. gargensis experiences much lower concentrations 
of ammonium that during growth on ammonium in batch culture, which 
probably influences the expression patterns of some of the listed proteins. 
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Extended Data Figure 4 | Conversion of 0.05 mM cyanate by N. gargensis. _ were performed in four biological replicates and the chemical measurements 
a, Concentration changes of cyanate, ammonium, and nitrite caused by were done in three technical replicates (averaged). Data points are mean values 
N. gargensis in a mineral medium containing 0.05 mM cyanate as the only of four biological replicates, error bars show s.d. 


source of energy and reductant. b, Abiotic control experiment. All experiments 
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Extended Data Figure 5 | Nitrospira moscoviensis has a functional cyanase. _ replicate were done in three replicates. Data points are mean values, error 
Concentration changes of cyanate and ammonium during incubation of bars show s.d. Asterisks indicate statistical significance between N. moscoviensis 
N. moscoviensis (27.6 + 3.9 1g ml * protein) in a mineral medium containing _ and dead biomass, *P < 0.05, ***P < 0.001. Significance was assessed by two- 
cyanate, but no nitrite. Results from a control experiment with identical way analysis of variance (ANOVA) including Tukey’s honest significant 


amounts of dead biomass of N. moscoviensis are also displayed. Allexperiments difference (HSD) test. 
were performed in triplicate and the chemical measurements from each 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


_ 1200 
= 
a 
O° 1000 —o— 0 mM Cyanate biotic a 
< —o— 0 mM Cyanate biotic b 
800 —7-— 1mM Cyanate biotic a 
—y— 1 mM Cyanate biotic b 
—o- 2 mM Cyanate biotic a 
600 —a— 2 mM Cyanate biotic b 
—>- 3mM Cyanate biotic a 
—o— 3 mM Cyanate biotic b 
400 —“— 5 mM Cyanate biotic a 
—4— 5 mM Cyanate biotic b 
—O- 5 mM Cyanate abiotic a 
200 —@ 5 mM Cyanate abiotic b 
0 
0 10 20 30 40 50 60 70 
time (h) 
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concentrations on nitrite oxidation by N. moscoviensis. Biomass was monitored. Incubations were performed in duplicates. 


incubated for 60h in medium containing 1 mM nitrite and cyanate 
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Extended Data Figure 7 | Nitrosomonas nitrosa Nm90 has no cyanase 
activity and is not inhibited by 1 mM cyanate. Concentration of nitrite during 
incubation of N. nitrosa in a mineral medium containing: 1 mM ammonium 
(filled circles); 1 mM cyanate (filled squares); 1 mM cyanate and 1 mM 


ammonium (open circles); 10 mM ammonium (filled triangles); and 1 mM 
cyanate and 10 mM ammonium (open triangles). All experiments were 


performed in three biological replicates, data points are mean values, error bars 
show s.d. 
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Extended Data Figure 8 | Reciprocal feeding of ammonia and nitrite the cyanase-negative ammonium-oxidizing bacterium Nitrosomonas nitrosa 
oxidizers during cyanate conversion. As activities differed between biological | Nm90 and the cyanase-positive nitrite oxidizer N. moscoviensis in a mineral 
replicates (as often observed for nitrifying strains that are very sensitive to medium containing 1 mM cyanate (a-c) or 1 mM cyanate and 1 mM 


rubber stoppers, contaminants on glass material, etc.), data are displayed for ammonium (d-f). Data points are mean values, error bars show s.d. of three 
each replicate individually. Concentrations of cyanate, ammonium, nitrite,and technical replicates. 
nitrate are displayed as bar (left) and line charts (right) during the growth of 
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Extended Data Table 1 | Presence of cyanase, nitrite/nitrate transporters and enzymes related to urea metabolism in ammonia- and nitrite- 
oxidizing microorganisms with fully a sequenced genome. 
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*Formate-nitrite transporter family: encoded by focA/nirC, has been postulated to transport cyanate’®. 

+Nitrate/nitrite transporter family: encoded by nark, might also transport cyanate due to its chemical similarity to nitrite. 
{ABC transporter of nitrate/sulfonate/bicarbonate: this transporter family has been shown to transport cyanate as well?°. 
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A hemi-fission intermediate links two 
mechanistically distinct stages of membrane fission 


Juha-Pekka Mattila’, Anna V. Shnyrova™, Anna C. Sundborger’, Eva Rodriguez Hortelano’, Marc Fuhrmans’, Sylvia N eumann?, 
Marcus Miiller*, J enny E. Hinshaw”, Sandra L. Schmid! & Vadim A. Frolov”® 


Fusion and fission drive all vesicular transport. Although 
topologically opposite, these reactions pass through the same 
hemi-fusion/fission intermediate’’, characterized by a ‘stalk’ in 
which only the outer membrane monolayers of the two compart- 
ments have merged to form a localized non-bilayer connection”. 
Formation of the hemi-fission intermediate requires energy input 
from proteins catalysing membrane remodelling; however, the 
relationship between protein conformational rearrangements 
and hemi-fusion/fission remains obscure. Here we analysed how 
the GTPase cycle of human dynamin 1, the prototypical membrane 
fission catalyst**, is directly coupled to membrane remodelling. 
We used intramolecular chemical crosslinking to stabilize dyna- 
min in its GDP: AIF, -bound transition state. In the absence of 
GTP this conformer produced stable hemi-fission, but failed to 
progress to complete fission, even in the presence of GTP. 
Further analysis revealed that the pleckstrin homology domain 
(PHD) locked in its membrane-inserted state facilitated hemi- 
fission. A second mode of dynamin activity, fuelled by GTP hydro- 
lysis, couples dynamin disassembly with cooperative diminishing 
of the PHD wedging, thus destabilizing the hemi-fission inter- 
mediate to complete fission. Molecular simulations corroborate 
the bimodal character of dynamin action and indicate radial and 
axial forces as dominant, although not independent, drivers of 
hemi-fission and fission transformations, respectively. Mirrored 
in the fusion reaction’*, the force bimodality might constitute a 
general paradigm for leakage-free membrane remodelling. 
Membrane fission and fusion both involve a pivotal stage, in which 
lipids rapidly rearrange into a new topology under extreme protein- 
driven stress’. It is generally accepted that lipid rearrangements pro- 
ceed in distinct steps, involving the formation of transient highly 
curved non-bilayer intermediate(s)”"°. How conformational changes 
of the protein machinery orchestrate this orderly remodelling of lipids 
remains unknown. This knowledge gap is highlighted in dynamin, the 
founding member of a superfamily of large GTPases implicated in 
membrane fission and fusion events*°. Self-assembly of dynamin into 
helical structures around the necks of deeply invaginated clathrin- 
coated pits and the consequent stimulated GTPase activity drive con- 
formational changes that underpin its role in catalysing membrane 
fission and the release of clathrin-coated vesicles*®. Crystallographic 
studies have provided multiple insights into the nature of these 
GTPase-driven conformation changes. The amino- and carboxy- 
terminal helices of dynamin’s GTPase (G) domain, together with the 
C-terminal helix from the GTPase effector domain (GED), form a 
three-helix bundle, termed the ‘bundle signalling element’ (BSE) 
(Extended Data Fig. la). Crystal structures of a minimal G domain- 
BSE dynamin construct bound to either GMPPCP or the nucleotide 
transition-state analogue GDP: AIF, revealed two distinct conforma- 
tions corresponding to a ~70° swing of the BSE relative to the 


G domain core (Fig. la, inset)'’”*. Thus, akin to a lever arm in motor 
proteins”’, it was proposed that BSE movements transmit and amplify 
transition-state-dependent conformational changes in the G domain 
to affect intra- and/or intermolecular conformational changes required 
for fission’?. Observed only in the context of a minimal dynamin 
construct’””’, whether the dramatic nucleotide-dependent movement 
of the BSE occurs in the full-length protein and how it is transmitted to 
the membrane-interacting PHD and further on to lipids are unknown. 

To gain insight into the functional consequences of this nucleotide- 
dependent conformational change, we used molecular engineering to 
access and control BSE motility in full-length wild-type dynamin 1 
(WT-Dyn1). To this end, we introduced Cys at position 11 into a 
functional reactive-Cys-less (RCL) derivative of WT-Dyn1 (ref. 14) 
for site-specific labelling with a thiol-reactive BODIPY derivative 
and replaced Tyr at position 125 with Trp to yield CW-Dynl 
(Fig. la, inset). This mutant and its BODIPY conjugate retained near 
normal basal and assembly-stimulated GTPase activities (Extended 
Data Fig. 1b, c). To detect BSE movements we used photo-induced 
electron transfer (PET)'*, which results in the quenching of the 
BODIPY label in the BSE (Fig. 1a) by the Trp residue in the G domain 
only if the two moieties reside within a radius of 10A (Fig. 1a, inset)’*. 
When bound to lipid nanotubes (Fig. 1b), the magnitude of PET- 
induced quenching of BODIPY varies in a nucleotide-dependent 
manner, becoming progressively higher along the transition from 
the GTP-bound state (stabilized by GMPPCP) to the GDP:PI 
transition state (stabilized by GDP: AIF, _). This behaviour is consist- 
ent with the GIP-dependent BSE movement predicted by structural 
analyses (Fig. la)'’”’, which further suggest that the BSE pivots 
around a Pro residue (P294) connecting the C-terminal helix of the 
G domain to the core!”””!®. Consistent with this, mutation of P294 
reduces BSE motility and impairs both the GTPase and fission activ- 
ities of dynamin (Extended Data Fig. 2). Together, these data confirm 
that the BSE in full-length dynamin undergoes GTP-dependent con- 
formational changes consistent with a rotation around P294 away 
from the G domain core. 

We next applied site-specific crosslinking between the G domain 
and the BSE to stabilize the ‘transition-state’ conformer. Trp 125 in 
CW-Dyn1 was replaced with Cys to produce CC-Dyn1. Using a series 
of variable-length thiol-specific homo-bifunctional methanethiosulfo- 
nate (MTS) reagents, we identified MTS-4-MTS, which has a theor- 
etical crosslinking span of 7.8 A, as the shortest reagent able to yield 
~100% crosslinking efficiency of CC-Dyn1, as evidenced by a gel shift 
to a faster migrating species (Fig. 1c). This is in good agreement with 
the distance separating the two Cys residues in the transition state 
(Fig. la). Hereafter, we refer to the crosslinked species as CxC-Dynl. 

In solution, CxC-Dyn1 exhibited enhanced GTPase activity and 
self-assembled into rings, similar to GDP*AIF, -bound WT-Dyn1 
(Extended Data Fig. 3a, b), verifying that crosslinking stabilizes the 
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Figure 1 | Stabilization of the transition-state conformer of dynamin. 

a, Cartoon illustrating the mobility of BSE during dynamin’s GTPase cycle. 
Blue, apo (no nucleotide)/GDP-bound; yellow, GMPPCP-bound; yellow/ 
blue, GDP: AIF, -bound transition-state. Inset shows BSE conformation in 
the crystal structures of GMPPCP-bound (yellow; Protein Data Bank (PDB) 
accession 3ZYC) and GDP: AIF, -bound (blue; PDB accession 2X2E) 
G-domain-BSE fusion protein. b, Loss of PET-dependent quenching of 
BODIPY fluorescence after addition of GMPPCP, GTP, or GDP either alone 
or in the presence of AICI, and NaF (that is, GDP’ AIF, ). The decline in 
fluorescence signal in the presence of GTP reflects its hydrolysis. c, SDS— 
polyacrylamide gel electrophoresis (SDS-PAGE) of CC-Dyn1 = crosslinker. 
The faster migrating CxC-Dyn1 is stabilized in the transition state. MW, 


BSE at or near its transition-state conformation. CxC-Dyn1 retains the 
ability of WT-Dyn1 to produce high membrane curvature from flat 
lipid templates (Fig. 1d and Extended Data Fig. 4). Furthermore, like 
CC-Dyn1 (and WT-Dyn1)”, CxC-Dyn1 rapidly assembles on and 
constricts tubular membrane templates (Fig. le). However, unlike 
CC-Dyn1, CxC-Dyn1 failed to produce membrane fission either in 
the presence (Fig. le, f and Supplementary Videos 1 and 2) or absence 
(Fig. 1d and data not shown) of GTP. Reversal of the crosslink with 
dithiothreitol (DTT) led to full recovery of fission activity (Fig. 14), 
indicating that inhibition was due to disruption of dynamin’s 
conformational changes and not to chemical modification of the 
cysteine residues. 
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molecular weight. d, Representative images showing membrane tubulation of 
SUPER templates (=5 independent experiments) or GUVs (3 independent 
experiments) by CC- and CxC-Dyn1 in the absence of nucleotides. Images 
are inverted for clarity. e, Constriction (seen as dark patches) and fission 
activity of CC- and CxC-Dyn1 on fluorescently labelled membrane tethers 
incubated in the presence of 1 mM GTP (see Supplementary Videos 1 and 2; 
representative data from 3 independent experiments). f, Fission activity 
assessed by vesicle release into the supernatant (Supt) from SUPER templates 
of WT-Dyn 1 (filled squares) CC-Dyn1 (filled circles) and CxC-Dyn1 

(open upward-pointing triangles), and CxC-Dyn1 treated with DTT 

(open downward-pointing triangles) (average + s.d., n = 3). 


To determine at which stage fission is disrupted, we analysed the 
membrane activity of CxC-Dynl by measuring protein-induced 
changes of the ionic conductance of the lumen of thin lipid nanotubes 
pulled from a planar reservoir membrane’. In the presence of 
GTP, CC-Dyn1 behaved like WT-Dyn1 (ref. 21): it caused a decrease 
in conductance due to nanotube constriction, followed (in 3 out 
of 3 cases) by an acute drop in conductivity to zero, indicating com- 
plete closure of the tube lumen (Fig. 2a), which for WT-Dyn1 corre- 
lated with membrane fission”*. In contrast, CxC-Dyn1 failed to trigger 
lumen closure in the presence of GTP in 11 out of 11 cases, although it 
retained the ability to constrict and lower nanotube conductance 
(Fig. 2a). In the absence of nucleotide (apo) or with GMPPCP 
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Figure 2 | CxC-Dyn1 produces stable hemi-fission. a-d, Representative traces 
of nanotube conductance changes in the presence of CC-Dyn1 (red traces) 

or CxC-Dyn1 (black traces) obtained in the presence (a) or absence (b) of GTP, 
or in the presence of GMPPCP (c, d). d, Expanded timescale of the flickering 
hemi-fission phenotype, boxed in c. G, indicates conductance normalized to 


(Fig. 2b, c), CC-Dynl, like WT-Dyn1, produced stationary constric- 
tion but no lumen closure in 10 out of 10 cases and 3 out of 5 cases, 
respectively. 

Surprisingly, CxC-Dyn1 in either the apo (Fig. 2b; 10 out of 12 cases) 
or GMPPCP-bound state (Fig. 2c; 8 out of 9 cases) produced complete 
closure of the tube lumen. These observations contrasted with the lack 
of scission of membrane tethers constricted by CxC-Dyn1 (Fig. 1d, e). 
The lumen closure could also correspond to hemi-fission, a state 
characterized by self-merger of the inner monolayer of the nanotube 
membrane without rupture of the outer one’. In support of 
this interpretation, we observed the occurrence of long-lived (up to 
seconds; Fig. 2d) flickering events that indicate reversible formation 
of a hemi-fission intermediate, both in the absence of nucleotide 
(3 out of 10 closure events) and in the presence of GMPPCP 
(Fig. 2c, d; 2 out of 8 closure events). Flickering events were also 
occasionally detected with WT-Dyn1, but these were highly transient 
(milliseconds) intermediates *'. 

Cryo-electron microscopy (cryo-EM) analyses provided additional 
evidence for the formation of hemi-fission intermediates. Upon close 
examination of individual liposomes tubulated by CxC-Dynl, we 
observed frequent examples of highly constricted tube segments in 
which the inner luminal diameter of the tube was no longer discernable 
(Fig. 2e, arrows, insert). Such putative hemi-fission events were 
observed about five times more frequently in tubules decorated by 
CxC-Dyn1 (3.87 + 1.7 per um protein-coated tubes) as compared to 
CC-Dyn1 (0.86 + 1.3 per jim protein-coated tubes). As the inner dia- 
meter can no longer be resolved at these putative sites of hemi-fission 
(Fig. 2e, inset), we measured their outer diameter (32 + 4.83 nm) and 
found that they were even narrower than those previously measured 
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the nanotube conductance before protein addition. e, Cryo-EM images 
(representative examples from 4 independent experiments) of membrane 
tubulation by CxC-Dyn1 in the presence of GMPPCP. Arrows indicate putative 
hemi-fission events detected by the loss of a defined inner leaflet of the bilayer 
occurring at sites of super-constriction (see inset). Scale bars, 100 nm. 


for the super-constricted tubes formed by Dyn1(K44A) in the presence 
of GTP (37 nm) (ref. 22). Given that the latter had an inner lumenal 
diameter of 4nm (ref. 22), these data further support our conclusion 
that CxC-Dyn1 stabilizes a hemi-fission intermediate. Interestingly, 
long-ordered protein lattices were not observed at these sites, suggest- 
ing that the hemi-fission transformation is predominantly driven by 
small protein oligomers”’. 

The ability of CxC-Dyn1 to generate hemi-fission in the absence of 
nucleotide suggests that conformational changes in the BSE are 
somehow transmitted through the stalk to the PHD (Fig. 1a) to alter 
dynamin-membrane interactions and enhance its ability to remodel 
membranes (Fig. 3a). To test this we measured the nature of dynamin- 
membrane interactions using fluorescence resonance energy transfer 
(FRET) between Trp residues in the PHD and dansyl lipids in the 
target membranes*’. CxC-Dyn1 exhibited a nearly 20% increase in 
the dansyl fluorescence emission upon Trp excitation compared to 
either WT- or CC-Dyn1 (Fig. 3b and Extended Data Fig. 5a), suggest- 
ing increased membrane penetration of the PHD in the transition 
state. Consistent with this interpretation, membrane binding of 
CxC-Dyn1 also displayed a decreased sensitivity to salt extraction 
(Fig. 3c), indicative of increased hydrophobic versus electrostatic inter- 
actions with the membrane”’. Together these data provide direct evid- 
ence for the enhanced ‘membrane wedging activity of transition-state 
dynamin (Fig. 3b, insets). 

The hemi-fission activity of CxC-Dyn1 in the apo state contrasts 
with its inability to produce either hemi-fission or complete fission in 
the presence of GTP (Fig. 2a—d). To understand this paradoxical effect 
of GTP on CxC-Dyn1, we further examined the nature of the mem- 
brane constriction of membrane nanotubes produced by CxC-Dyn1. 
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Figure 3 | CxC-Dyn1 displays enhanced membrane wedging activity and 
altered scaffolding properties. a, Cartoon illustrating transmission of 
transition-state BSE conformational information through the stalk to the PHD. 
b, FRET between PHD Trp residues and dansyl lipids measuring the relative 
membrane insertion of CC- and CxC-Dyn1] (average + s.d., n = 3) (see 
Extended Data Fig. 5 for complete spectra). Fp and F correspond to fluorescence 
intensities of dansyl-labelled liposomes in the absence and presence of FRET 
donors, respectively. c, Hydrophobic character of membrane insertion of 
WT-Dyn1 (filled squares), CC-Dyn1 (filled circles) and CxC-Dyn1 (open 
upward-pointing triangles) measured by resistance to salt extraction 

(average + s.d., n = 3). d, Differential behaviour of nanotubes to vertical 
displacement of the patch-pipette depending on the nature/persistence of the 
protein scaffold. Long scaffolds formed by WT- or CC-Dyn1 in the presence of 
GMPPCP prevent retraction of the nanotube into the reservoir when 
shortened, and hence there is no change in tube conductance (left). Short/ 
flexible scaffolds formed by CxC-Dyn1 in the presence of GTP allow free 
movement of membranes back into the reservoir, with concomitant increase in 
conductance (right). e, Addition of GTP to GUVs previously tubulated by 
preassembled CxC-Dyn1 (3 independent experiments) promotes tubule 
retraction towards the vesicle membrane. The tubules remain constricted 
during retraction (see Supplementary Video 3). Images are inverted for clarity. 
f, Concentration dependence and cooperativity of the assembly-stimulated 
GTPase activity of WT-Dyn1 (filled squares), CC-Dyn1 (filled circles) and 
CxC-Dyn1 (open upward-pointing triangles) measured on 100 nm L-a- 
phosphatidylinositol-4,5-bisphosphate (PIP2)-containing liposomes by 
quantifying the release of inorganic phosphate (P;) (average + s.d., n = 3). 


The nanotube conductance characterizing stationary constriction pro- 
duced by CxC-Dyn1 in the presence of GTP (G,,= 0.22 + 0.1) was 
comparable to that produced by CC-Dyn1 (G,,= 0.27 + 0.05) in the 
absence of nucleotide (Fig. 2a, b). Such tight membrane constriction is 
traditionally associated with polymerization of a rigid helical scaffold 
that, as with CC-Dyn1, prevents retraction of the underlying con- 
stricted nanotube to the reservoir, and the accompanying increase in 
the tube conductance”! (Fig. 3d, left). In contrast, the length of the 
nanotube constricted by CxC-Dyn1 could be freely decreased in the 
presence of GTP, seen as an increase in conductance (Fig. 3d, right). 
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Figure 4 | The two stages of dynamin-catalysed membrane fission. 

a, Coarse-grained simulations revealed formation of a stable hemi-fission 
intermediate (wormlike lipid micelle, middle panel) separating the two 
different stages of membrane fission. Axial cross-sections of representative 
snapshots from the simulation runs are shown. Localized radial constriction of 
a membrane tube by a two-protein-mimetic ring system (effective radius 

5-6 nm, inter-ring distance ~10 nm) triggered hemi-fission (red arrow). The 
tilt characterizes the local membrane orientation imposed by the disks”° (see 
Extended Data Fig. 6a for the ring description). The micelle intermediate 
remained stable under simulation conditions (see Extended Data Fig. 7) unless 
a moderate axial force was applied to cause its rupture, thus completing the 
fission reaction (blue arrow). The rectangular box indicates the dimensions of 
the wormlike micelle (~9nm X 5.5 nm). b, Model of distinct dynamin 
activities and conformational changes mediating the two stages of dynamin- 
catalysed membrane fission that are required to form the metastable hemi- 
fission intermediate and then to drive full fission (see Extended Data Fig. 8). 


This ‘weakening’ of the dynamin scaffold by GTP can be associated 
with GTP-driven depolymerization and/or loosening of the scaf- 
fold’*”°. Indeed, addition of GTP to tubes produced by CxC-Dyn1 
from giant unilamellar vesicles (GUVs) or supported bilayers 
with excess membrane reservoir (SUPER) templates caused their 
partial retraction, while they remained constricted (Fig. 3e and 
Supplementary Video 3). Moreover, the GTPase activity of mem- 
brane-bound CxC-Dyn1 (Fig. 3f) is significantly reduced relative to 
WT- or CC-Dynl, but its membrane binding is unaffected. 
Importantly, CxC-Dyn1 is no longer released from membranes during 
GTP hydrolysis (Extended Data Fig. 5b, c). Together, these data sug- 
gest that although GIP induces depolymerization, the impaired 
hydrolysis and enhanced membrane interactions of CxC-Dyn1 pre- 
vent its release from lipid templates in the presence of GIP. The 
resulting loosened scaffolds retain their curvature activity but fail to 
produce hemi-fission, corroborating the notion that formation of this 
fission intermediate requires a critical degree of dynamin oligomeriza- 
tion, for example, a single rung of two-start helix”!”. 

To test the generality of our findings that localized membrane con- 
striction by a short membrane-inserting scaffold yields stable hemi- 
fission but not complete fission, we applied coarse-grained computer 
simulations previously used to analyse membrane fusion”**”*, and more 
recently dynamin-mediated membrane fission”®. The main stages of 
dynamin-driven fission were remarkably well reproduced in these 
simulations by modelling simple constriction of a cylindrical lipid 
bilayer using a system of amphiphilic disks arranged in rings” 
(Extended Data Figs 6 and 7). Intriguingly, ring constriction could 
not bring the simulations past the hemi-fission state**. To obtain 
mechanistic insights into this disruption of the fission reaction, we 
further analysed the structure and stability of the hemi-fission 
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intermediate using similar simulation modelling. Closely imitating the 
localized constriction of the membrane nanotube by CxC-Dynl 
(Fig. 4a and Extended Data Fig. 6a, b), we induced self-mergers of 
the inner monolayer of the tube that further developed into an 
extended wormlike micelle structure (Fig. 4a and Extended Data 
Fig. 6b). The micelle geometry was reproducible in different simu- 
lation runs (length (L)=9.0+0.9nm, standard deviation (s.d.); 
n = 8; four independent simulation runs). These micelles remained 
stable throughout the observation period even under application of 
moderate membrane tension (see Methods). Relaxation of the geo- 
metric constraints imposed by the ring system (ring ‘disassembly’”*) 
caused shortening of the micelle (to L = 5.2 + 0.6, s.d.;n = 10) without 
rupture (Extended Data Figs 6 and 7), demonstrating that the hemi- 
fission intermediate does not spontaneously rupture even in the 
absence of the protein support. Hence, as for membrane fusion”, 
completion of the fission reaction requires additional energy input 
to overcome the intrinsic lipid resistance and the stabilizing effect of 
the protein scaffold. 

This energy input apparently comes from GTP hydrolysis. 
Importantly, the connection between the G domains and PHD, 
mediated by BSE and disrupted in CxC-Dyn1, is required to deliver 
energy to the hemi-fission intermediate. It is unlikely that this GTP 
hydrolysis-driven conformational change causes additional mem- 
brane constriction because progression of the GTP cycle past the 
transition state diminishes the curvature activity of dynamin’ and 
structural studies clearly associate membrane super-constriction with 
the pre-transition-state dynamin conformer”. Interestingly, in com- 
puter simulations, application of moderate (~0.6 dyn cm~') mem- 
brane tension’” in combination with ring disassembly produced 
immediate rupture of the hemi-fission intermediate. The combination 
of tenfold weaker tension and a gradual increase of the separation 
between rings (for example, due to abrupt loosening of the scaffold’*) 
also mediated the transition from hemi-fission to complete fission 
(Extended Data Fig. 7). Although the mechanics of this transition 
require further investigation, our data suggest that they differ from 
radial constriction and probably involve production of an axial force in 
coordination with disassembly of the dynamin scaffold. 

These findings demonstrate that dynamin implements different 
strategies while mediating sequential topological transitions of inner 
and outer membrane monolayers for fission (Fig. 4b and Extended 
Data Fig. 8). This bimodality, which is probably embedded in the 
molecular design of the proteins that catalyse fission and fusion, 
may constitute a fundamental feature required to coordinate the 
sequential, two-step, remodelling of membrane monolayers required 
for non-leaky formation of hemi-fusion/fission intermediates and sub- 
sequent fusion/fission. It is tempting to speculate that the current 
controversies regarding mechanistic models of dynamin*® are related 
to the previously unappreciated bimodal nature of the fission process. 
That is, the different models may reflect sequential modes of dynamin 
action required for formation and rupture of hemi-fission. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Protein expression and purification. Sf9 insect cells were transiently transfected 
with complementary DNAs encoding wild-type human dynamin 1 or indicated 
mutants subcloned in pIEx-6 vector (EMD Millipore) for protein production. 
Proteins were purified by affinity chromatography using glutathione 
S-transferase (GST)-tagged Amphiphysin-II SH3 domain as an affinity ligand as 
described previously**. Purified proteins were dialysed overnight in 20mM 
HEPES (pH 7.5), 150 mM KCl, 1 mM EDTA, 1 mM DTT and 10% (v:v) glycerol, 
aliquoted, flash-frozen in liquid N>, and stored at —80 °C. Protein concentrations 
were determined by absorbance at 280 nm using a molar absorptivity coefficient of 
59,820M ‘cm! for Dyn1®“'(P11C/Y125W) and Dyni®“(P11C/Y125W/ 
P294A), 54,445M ‘cm’! for Dyn1®“'(P11C/Y125C), and 56,185M ‘cm! 
for other dynamin-1 proteins. 

Protein labelling. The Cys residues at positions 11 in Dyn1*®“'(P11C/Y125W), 
Dyn1®“(P11C/Y125W/P294<A), and 752 in Dyn1®“(T752C) were selectively 
labelled in the absence of reducing agent using tenfold molar excess of the thiol- 
reactive iodoacetamide derivative of BODIPY-Fl (Life Technologies). After 30 min 
incubation at room temperature, DTT was added to 5 mM to quench the reaction. 
The solution was extensively dialysed against buffer containing 20 mM HEPES 
(pH 7.5), 150 mM KCl, 1mM EDTA and 1 mM DTT to separate unreacted dye 
molecules. After high-speed ultracentrifugation (100,000g) to discard any preci- 
pitated protein, the efficiency of labelling was determined using a molar absorp- 
tivity coefficient of 76,000 M ‘cm ! at 502nm for BODIPY. 

Protein crosslinking. MTS-based homobifunctional crosslinking reagents 
were obtained from Toronto Research Chemicals. Unless otherwise indicated, 
crosslinking of Dyn1®“'(P11C/Y125C) was carried out at room temperature for 
15-30 min with 50 uM MTS reagents. The theoretical spanning distance of MTS 
reagents was derived from ref. 29. For visualization of the crosslinked proteins, all 
unreacted Cys residues were blocked by 10 mM N-ethylmaleimide prior to addi- 
tion of 6X SDS sample buffer. Samples were subsequently resolved on a 7.5% 
polyacrylamide gel followed by Coomassie staining. For reversal of crosslinking 
samples were incubated for 30 min on ice with 20mM DTT. 

Preparation of liposomes, lipid nanotubes and SUPER templates. 1,2- 
Dioleoyl-sn-glycero-3-phosphocholine (DOPC), 1,2-dioleoyl-sn-glycero-3-phospho- 
(1'-rac-glycerol) (DOPG), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine (DOPE), 
1,2-dioleoyl-sn-glycero-3-phospho-L-serine (DOPS), L-«-phosphatidylinositol- 
4,5-bisphosphate (PIP), 1,2-dioleoyl-sn-glycero-3-phosphoethanolamine- 
N-(lissaminerhodamine B_ sulfonyl) (RhPE), 1,2-dioleoyl-sn-glycero-3- 
phosphoethanolamine-N-(5-dimethylamino-1-naphthalenesulfonyl (dansyl-PE), 
and C24:1 B-p-galactosylceramide (GalCer) were purchased from Avanti Polar 
Lipids. Cholesterol was from Sigma-Aldrich. Appropriate amounts of lipid stock 
solutions were mixed in a glass tube to obtain the desired compositions. 
The solvent was removed under a gentle stream of nitrogen and the lipid residue 
was subsequently maintained under a reduced pressure for 1-2h. The dry 
lipid film was hydrated for 30 min at room temperature in Milli-Q H,O (for 
preparation of SUPER templates) or 20mM HEPES (pH 7.5), 150mM KCl 
and subjected to three freeze-thaw cycles. The resulting suspensions of multi- 
lamellar vesicles were extruded through polycarbonate membranes of varying 
pore diameters to yield unilamellar liposomes of desired size. Lipid nanotubes 
composed of DOPC:DOPS:PIP2:GalCer (40:15:5:40) were generated using a 
bath sonicator according to procedures described previously”. Supported 
bilayers with excess membrane reservoir (SUPER) templates were prepared 
as previously reported’**’, with minor modifications. Briefly, a 20 ul aliquot of 
an aqueous suspension (5% w:w) of 2.5-j1m-diameter silica microspheres 
(Corpuscular) was added to a NaCl-containing solution of 100 nm liposomes 
(DOPC:DOPG:DOPE:DOPS:PIP3:RhPE = 19:40:20:15:5:1) for a total volume 
of 100 ul (with final lipid and NaCl concentrations of 200 14M and 300mM, 
respectively) in a 1.5ml low-adhesion polypropylene centrifuge tube (USA 
Scientific). This mixture was incubated for 30 min at room temperature with 
intermittent mixing. The templates were subsequently washed four times with 
1 ml of Milli-Q H,O by a low-speed spin (260g) for 2 min in a swinging-bucket 
rotor at room temperature, leaving behind after each wash a 100 pl volume for 
resuspension of the pelleted templates. 

GTPase assay. Basal and assembly-stimulated GTP hydrolysis rates of wild-type 
and mutant dynamins were measured using a Malachite Green-based colorimetric 
assay that detects the inorganic phosphate released during the time course of the 
reaction”. Briefly, indicated concentrations of proteins were incubated at 37 °C in 
the absence (basal) or presence of (assembly-stimulated) 100 nm liposomes pre- 
pared with DOPC:DOPS:PIP, = 80:15:5 (total lipid concentration = 150 1M) ina 
buffer containing 20 mM HEPES (pH 7.5), 150 mM KCl, 1 mM MgCl, and 1 mM 
GTP (Jena Bioscience). Twenty-microlitre aliquots were drawn from the reaction 
mixtures at several time-points and transferred to wells of a 96-well microplate 
containing 5411 0.5M EDTA, thereby quenching the hydrolysis reaction. 


One-hundred and fifty microlitres of Malachite Green stock solution was added 
to each well and the absorbance at 650nm was measured using a microplate 
reader. Free phosphate was determined from the absorbance values using a stand- 
ard curve. The initial rates of GTP hydrolysis were calculated from the linear phase 
of the time course. 

Preparation of giant unilamellar vesicles. Giant unilamellar vesicles (GUVs) 
were formed by spontaneous swelling of lipid films deposited on 40 1m silica 
beads. Briefly, DOPC:DOPE:DOPS:Chol:PIP):RhPE 28:24:15:30:2:1 mixture in 
chloroform (0.05 mg total lipid) was dried in a vacuum for 1h. Then the mixture 
was rehydrated by adding 10 yl of 1mM HEPES buffer, pH 7.0. After vigorous 
mixing, the multilamellar lipid solution was doped with 40 jm plain silica beads 
and deposited on a Teflon film as 4-5 drops of ~2 jl and then vacuum-dried for 
30 min. The beads covered by lipid film were picked from the Teflon film by a thin 
glass pipette, pre-hydrated for 5 min under HO saturated N, atmosphere, and 
then added from the top to a vertically placed plastic pipette tip filled with 5 pl ofa 
pH-buffered sucrose solution. GUVs formed spontaneously on the bead surface 
upon 10 min of gentle hydration at 60 °C. Then the lower end of the tip was briefly 
immersed into a homemade observation chamber filled with 1 ml of buffer 
(150mM KCl, 10mM HEPES, 1mM EDTA, 2mM MgCl), thus transferring 
the beads with the attached and detached GUVs into the chamber. The 
0.13-0.16-mm-thick cover glass of the chamber was pretreated with bovine serum 
albumin (BSA) solution (0.1 gl, 5 min at room temperature) to inhibit lipid 
attachment to the glass surface. GUVs were further monitored by fluorescence 
microscopy, as described later. 

Preparation of lipid nanotubes for the ionic conductance measurements. 
Bilayer lipid membranes (BLMs) were formed from the same lipid composition 
as GUVs on a gilded copper grid (mesh 200, Agar Scientific) pretreated with the 
same lipid mixture (10gl"' total lipid) dissolved in decane:octane (1:1 v/v): a 
small drop of the mixture was deposited across the grid and the solvents were 
then evaporated under argon stream. The grid was mounted on the bottom of an 
observation chamber that was subsequently filled with the buffer containing 
150 mM KCl, 10 mM HEPES, 1 mM EDTA, 2 mM MgClp. Finally, a small amount 
of lipid mixture in squalane (20-30 gl‘, total lipid) was ‘painted over’ the grid 
using a thin brush. Lipid bilayers formed spontaneously on each mesh covered by a 
thick film deposited by the brush. The excess lipid material, expelled to the peri- 
phery of the mesh, formed a toroidal meniscus maintaining the lateral tension of 
the lipid bilayer. 

Lipid membrane nanotubes were pulled from the parent BLM using a nano- 
positioning system based upon high-resolution NanoPZ actuators (Newport 
Corporation) and calibrated piezo-micromanipulator (Newport; 30 mm travel). 
Fire-polished borosilicate patch-pipettes (tip diameter of ~1 mm) were used for 
pulling. The tube formation and manipulation were performed as described earl- 
ier°*', Proteins were delivered with a second micropipette, back-filled with a 
7mM solution of the CC- or CxC-Dyn1 solution in 150mM KCl, 20mM 
HEPES, 1 mM EDTA and 2 mM MgChy. For experiments conducted in the pres- 
ence of nucleotide, the nucleotides were added in equal concentration both to the 
observation chamber and the protein delivery pipette. 

Fission assay. The efficiency of wild-type and mutant dynamins to catalyse the 
release of membrane vesicles from RhPE-labelled SUPER templates was analysed 
by means of a sedimentation assay, as described elsewhere’**’. In brief, an aliquot 
of template suspension was added without mixing to a final volume of 100 pl of 
20 mM HEPES (pH 7.5), 150mM KCl, with 1 mM MgCl, 1 mM GTP, and indi- 
cated protein concentrations. The samples were left undisturbed for 30 min at 
room temperature, the templates subsequently pelleted at 260g for 2 min and 
the supernatants mixed with Triton X-100 to dissolve released vesicles. Total 
membrane fluorescence of templates was determined in a separate reaction by 
adding equal amount of templates to Triton X-100 before pelleting. The fluor- 
escence intensity of the supernatants was read on 96-well plates using a plate 
reader (Bio-Tek Instruments) with excitation and emission monochromators 
set at 530/25 and 590/25 nm, respectively. 

Sedimentation assay. Self-assembly of wild-type and mutant dynamins and their 
GTP hydrolysis-triggered disassembly were assessed by sedimentation after high- 
speed centrifugation. Two identical sets of samples were prepared by incubating 
dynamin (1 1M) for 30 min with or without 400 nm DOPS liposomes (total lipid 
concentration = 300 1M) in 20 mM HEPES, (pH 7.5), 150 mM KCl, 1 mM MgCl, 
in a final volume of 30 il at room temperature. One millimolar GTP or GMPPCP 
was added to one set of samples and both sets were transferred to a 37 °C water 
bath for 5 min. Mixtures were then spun at 20,800g for 20 min in a microfuge 
refrigerated at 4°C to obtain supernatant (S) and pellet (P) fractions. The pellet 
fraction containing liposomes and assembled protein was resuspended in 30 pl 
of the same buffer to obtain equal volumes of S and P fractions. Samples 
were subsequently resolved on a 7.5% polyacrylamide gel and visualized by 
Coomassie staining to evaluate protein levels. Dynamin self-assembly on 
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100 nm DOPC:DOPS:PIP2 = 80:15:5 liposomes in the absence of GTP was quan- 
tified using an identical approach. 

Fluorescence spectroscopy. All fluorescence measurements were carried out with 
0.1HM BODIPY-labelled Dyn1®“'(P11C/Y125W) or Dyn1®“'(P11C/Y125W/ 
P294A) in buffer containing 20mM HEPES (pH 7.5), 150mM KCl and 1mM 
MgCl, using a Fluorolog-3 photon-counting steady-state spectrofluorometer 
(Horiba Jobin Yvon) equipped with double excitation and emission monochro- 
mators, a cooled PMT housing, and a 450 W xenon lamp. Samples (2.4 ml final 
volume) were prepared in 10mm path length quartz cuvettes held at 25 °C and 
continuously stirred with a magnetic stir bar during data acquisition. Where 
indicated, dynamin was incubated with PIP2-containing lipid nanotubes (1:300 
molar ratio of protein to lipid) for 10 min to induce self-assembly. In PET experi- 
ments BODIPY was excited at 490 nm (2.5 nm bandpass) and emission was mon- 
itored at 510 nm (2.5 nm bandpass) with fluorescence intensity values recorded at 
10s intervals (5s signal integration). Nucleotides or AICI, (1 mM final concen- 
tration) were added to the cuvette at indicated time-points. For experiments 
involving GDP: AIF, , 10mM NaF was added to the buffer before data collection 
was initiated. Concentration-matched sample of BODIPY conjugated to 
Dyn1"“'(T752C) was used to establish the level of BODIPY emission intensity 
corresponding to complete loss of PET-induced quenching. 

FRET between PH domain tryptophans and dansyl-PE-containing 400nm 

liposomes (DOPS:dansyl-PE = 90:10) was used to investigate membrane inter- 
action of dynamin proteins, as described elsewhere”. Briefly, 2.4 ml samples com- 
posed of either 0.1 4M protein (donor only) or 5 uM lipid (acceptor only) were 
excited at 280 nm (2 nm bandpass) and their emission spectra recorded between 
315 and 550 nm (4nm bandpass). Increase in dansyl fluorescence due to FRET was 
monitored at 515nm in samples containing both donor and acceptor after a 
20 min incubation. Data in Fig. 3 are presented as F/Fo, where Fy corresponds to 
fluorescence intensity of dansyl-labelled liposomes in the absence of FRET donors, 
and F is the intensity measured. 
Fluorescence microscopy. Fluorescence imaging of RhPE-labelled SUPER tem- 
plates was performed in BSA-coated Nunc Lab-Tek chambered microscope slides 
(Thermo Scientific) using a Nikon Eclipse Ti (Nikon instruments) inverted micro- 
scope equipped with a X 100, 1.45-NA oil-immersion objective and ORCA-Flash 
4.0 CMOS camera (Hamamatsu). An aliquot of template suspension was added to 
200 pl 20mM HEPES (pH 7.5), 150mM KCl, 1mM MgCl, in the presence or 
absence of indicated nucleotides (1 mM final concentration) and allowed to settle 
to the bottom of the chamber. For curvature generation (tubulation) experiments, 
0.5 4M dynamin was added to the observation chamber before templates, and 
imaging was performed after 10-15 min incubation at room temperature. 
Membrane tethers were generated by rolling 20 um silica beads over the surface 
of the SUPER templates through tilting of the observation chamber’. 

The GUVs were monitored using an Olympus IX-70 inverted microscope 
(X150, 1.45-NA objective) equipped with an AndoriXon+ camera (Andor 
Technology). A halogen lamp was used as the excitation source, ensuring minimal 
photobleaching, 550/590 nm excitation/emission wavelengths were used. All 
images were collected and processed using the ImageJ .Manager open source 
software”. 

Electron microscopy. For negative-stain EM, samples (1-3 uM dynamin and 
200 uM DOPS liposomes incubated in the presence or absence of 1 mM GTP or 
GMPPCP for 30 min at room temperature) were absorbed onto carbon-coated 400 
mesh Cu/Rh grids (Ted Pella), stained with 2% uranyl acetate, and imaged in a 
Tecnai 12 (FEI) transmission electron microscope at 120 kV using a 2 X 2 Gatan 
CCD camera. For cryo-EM, a 3.511 sample (prepared as described earlier) was 
placed on a plasma-cleaned (Fishione) Quantifoil holey carbon EM grid (SPI 
Supplies), blotted with filter paper, and flash-frozen in liquid ethane using a 
Leica EM GP (Leica Microsystems). The grids were subsequently stored in liquid 
nitrogen. The vitrified samples were imaged at liquid nitrogen temperature on a 
Tecnai 20 FEG electron microscope (FEI) operating at 200 kV and images were 
collected with a 4 x 4 CCD camera. 

Measurement of ionic conductance through lipid nanotubes. The equivalent 
electrical circuit for nanotubes pulled from planar BLMs has been described 
previously”. The nanotube conductance was measured at 50-100 mV holding 
potential using an Axopatch 200B (Molecular Devices) amplifier. The signal 
was digitized using a PC-44 acquisition board (Signallogic) as described prev- 
iously*®. The current was acquired at voltage-clamp mode of the amplifier, col- 
lected using the acquisition board and processed offline using Origin software 
(OriginLab). The measured conductance of the nanotube in the presence of the 
protein was normalized to the conductance level measured for the nanotube just 
before protein addition. 

Molecular simulations. The simulation method* and the model parameters** 
used were as previously described***?**. The simulations were conducted using a 
molecular dynamics scheme with a dissipative particle dynamics thermostat****”*. 
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The simulations were performed in an ensemble that allows the length of the 
simulation box along the axis of the lipid cylinder to dynamically vary during 
the simulations to keep the tension in that direction constant. 

For modelling lipids, our simulations used a coarse-grained, solvent-free lipid 
model’®***” in which the lipids are represented as linear chains composed of two 
polar head-group particles and eight hydrophilic tail particles. The particles are 
connected by a harmonic bond potential and a soft bond-angle potential, while the 
non-bonded interactions are based on a third-order weighted-density functional 
of the particle densities. This model was shown to successfully reproduce the 
elastic and dynamic properties of lipid bilayers**”* as well as lipid phase behaviour 
and topological transitions****. 

The cylindrical tubes of lipid bilayers were assembled by using estimates for the 
number of lipids in the inner and outer monolayer based on their radii. These 
configurations were then relaxed by simulating the system in an ensemble that 
allowed the cylinder length to dynamically vary, keeping the tension along the 
cylinder axis at zero, resulting in a radius of 6.2 nm and an inside-to-outside lipid 
ratio of 126:205. The cylinders used in our simulations had a length of 38.8 nm at 
zero axial tension and consisted of 7,200 lipids. 

To explicitly test the effects of the insertion of the PH domains of the dynamin 
complex, we modelled the PH domains as amphiphilic hexagonal disks consisting 
of one layer of polar particles connected to one layer of hydrophilic particles. Each 
layer had three particles per edge and an edge length of 1.3 nm, and the two layers 
had a separation of 0.6 nm. The particles in each peptide disk were held together by 
a network of stiff, elastic bonds”*. 

To constrict the lipid cylinder, we arranged the peptide disks into the ring 
system described previously**. Twelve disks were restrained at positions equally 
distributed on a ring forming a belt around the cylindrical lipid bilayer (Extended 
Data Fig. 6a). Only the disks’ centres of mass were restrained, while the orienta- 
tions of the disks could freely change in response to interactions with the lipids. 
To represent one ‘rung’ of the dynamin spiral formed by the protein dimers*, we 
used pairs of such rings separated by Ax ~ 0.45nm. This distance was smaller 
than the disk size so the disks from the juxtaposed rings overlapped while pre- 
serving their independent mobility (Extended Data Fig. 6a). This way the disk 
pair created a flexible membrane-interacting surface imitating the adaptive mem- 
brane wedging by a pair of PH domains of dynamin dimers. The radius of one of 
the two juxtaposed rings was slightly smaller (Ar = 0.45-0.9 nm) so that the disk 
pair exerted a direct influence on the orientation of the membrane at the location 
of the peptides, thus stimulating formation of an hourglass-shaped lipid mor- 
phology”®. Two of the juxtaposed ring pairs situated 9nm apart (separation 
between the inner rings corresponding to ~10 nm separation between the mid- 
points of the ring pairs) constituted the ring system used in simulations 
(Extended Data Fig. 6a). 

In simulations with restrained disks, the ring radii and the separation between 
the rings were fixed and the position of the disks on the rings were tied to their 
respective anchor points with a harmonic potential”®. In the experiments with 
gradually changing separation between the rings, we fixed the position of one ring 
pair and slowly moved the other pair away along the axis of the membrane 
cylinder. In other simulations, we effectively ‘disassemble’ the rings by omitting 
the positional restraints and allowing the disks to move freely along the membrane 
surface after the stable hemi-fission intermediate has formed. 

A more detailed description of the peptide model and the simulation setup can 

be found elsewhere”. 
Stability and rupture of the hemi-fission intermediate. The time unit in our 
simulations, obtained from the self-diffusion coefficient for lipid at room temper- 
ature™’, was t = 2 ns. The characteristic time for a local relaxation process in the 
cylindrical bilayer system described can be estimated as ~100r (ref. 26). The total 
lifetime of the wormlike micelle obtained in the restrained system (9nm ring 
separation) under zero tension was 30,200t (four independent simulations). 
Disassembly of the rings did not produce rupture of the micelle (5,900). To probe 
the stability of this unrestrained system we applied small axial tension. The system 
remained stable for 18,2007 under 0.06 dyn cm ! tension (Extended Data Fig. 7; 
three independent simulations) and for 15,8007 under 0.12 dyn cm! tension (two 
independent simulations), indicating that moderate membrane tensions are not 
sufficient to make the hemi-fission intermediate unstable. From the lifetime of the 
hemi-fission state, the lower boundary for the barrier separating hemi-fission from 
fission can be estimated to be on the order of ~10 kgT (where kg is the Boltzmann 
constant and T is ambient temperature); however, the exact pathway(s) of the 
membrane transformations leading to complete fission and the corresponding free 
energy profiles require further investigation. 

To induce rupture of the pre-formed hemi-fission intermediate we applied an 
axial tension of 0.6 dyncm ’, typical for the planar bilayer systems used in the 
experiments. This tension produces immediate (lifetime <100t, five independent 
simulations) rupture in the unrestrained systems and also destabilized the 
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restrained systems, although much less efficiently (lifetime of 3,400 + 1,400t, s.d.; 
three independent simulations). To augment the effect of tension we add an 
additional axial force-factor by slowly (~0.03 nm per Tt) increasing the separation 
distance between the two double rings. This ring movement augmented the ten- 
sion effect so that immediate rupture (<100t; three independent simulations) was 
produced under 0.06 dyn cm tension. The pathways of the hemi-fission rupture 
explored here are summarized in Extended Data Fig. 7. 
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Extended Data Figure 1 | Domain structure and biochemical 
characterization of dynamin constructs. a, Domain structure of dynamin and 
cartoon illustrating that the GTPase domain (G domain, blue) connects 
through the bundle signalling element (BSE), composed of the N- and 
C-terminal helices of the G domain and the C-terminal helix from GED 
(yellow) to the stalk formed by the middle domain and GED (magenta). The 
pleckstrin homology domain (PHD, green) interacts with membrane lipids. 
b, c, Basal (b) and assembly-stimulated (c) rates of GTP hydrolysis for 0.5 uM 
WT-Dyn1 and CW-Dyn1 before and after BODIPY conjugation. Data are 
shown as average + s.d., n = 3. 
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Extended Data Figure 2 | Role of P294 in BSE conformational dynamics. 
a, Changes in emission intensity of BODIPY-labelled CW-Dyn1 and CW- 
Dyn1(P294A) due to loss of PET after addition of 1 mM GMPPCP. Although 
the BSE partially opens upon addition of GMPPCP, its movements are 
constrained relative to wild type by the mutation of P294. b, Assembly- 
stimulated GTPase activity of 0.5 uM P294A, P294G and P294V Dyn1 
measured on 100 nm liposomes relative to WT-Dyn1. The mutants show near 
wild-type activity, indicating their ability to self-assemble onto and tubulate 
liposomes (data shown as average + s.d., n = 4). c, Fission activity of 0.5 1M 
P294A, P294G and P294V Dyn1 relative to WT-Dyn1 measured as the 
percentage of total membrane released from SUPER templates during 30 min 
incubation in the presence of GTP (data shown as average + s.d., n = 3). 
Substitution of P294 with the more rigid valine residue has a greater effect on 
fission activity. 
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Extended Data Figure 3 | Characterization of CxC-Dyn1. a, Concentration _ the presence of GMPPCP visualized by negative stain. Insets: top view, rings; 


dependence of the specific GTPase hydrolysis rates of WT-Dyn1 (filled side view, short spirals (arrows). These rings are reminiscent of those previously 
squares), CC-Dyn1 (filled circles) and CxC-Dyn1 (open triangles) measuredin observed with WT-Dyn1 only in the presence of transition state nucleotide 
solution at 1 mM GTP (data shown as average + s.d., n = 3). b, EM analogues (for example, GDP’ AIF, )°?. Unlike CxC-Dyn1, CC-Dyn1 
micrographs (representative images from four independently prepared remained unassembled in the presence of GMPPCP (data not shown). Scale 


samples) showing CxC-Dyn1 assembled into rings and short spirals (arrows) in _ bars, 100 nm. 
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Extended Data Figure 4 | Negative-stain and cryo-EM images of CC- and 
CxC-Dyn1 assembled onto PS liposomes in the absence of nucleotides. 

a, b, Negative-stain (a) and cryo-EM (b) images are shown. Note the disordered 
nature of CxC-Dyn1 spirals relative to CC-Dyn1 structures seen via negative 
stain in a. Scale bars, 100 nm. b, Arrow points to relatively ordered CxC-Dyn1 
assemblies, while arrowheads point to sparse dynamin assemblies appearing 
as single or double rings. 
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Extended Data Figure 5 | Altered membrane interactions of the CxC-Dyn1 
transition-state conformer. a, Fluorescence emission spectra of 0.1 4M CC- 
Dyn1 or CxC-Dyn1 (donor) as well as dansyl-PE (acceptor)-containing 
liposomes (5 LM total lipid; 90 mol% PS, 10 mol% dansyl-PE) upon excitation 
at 280 nm. FRET between the PH domain Trp residues and dansyl is evident 
in the donor plus acceptor samples as a decrease in donor and an increase in 
acceptor emission. b, Self-assembly of the indicated proteins (1 1M) on 
liposomes identical to those used in the GTPase assay (300 1M total lipid; 
Fig. 3f) examined by sedimentation followed by SDS-PAGE analysis of the 
supernatant (S) and pellet (P) fractions. c, Percentages of proteins pelleted after 
incubation with or without 400 nm PS liposomes (1 1M protein, 300 WM 
total lipid) and 1 mM GTP, as indicated, was quantified by sedimentation 
followed by SDS-PAGE and densitometric analyses of the protein levels in 
supernatant and pellet fractions (data shown are average + s.d., n = 3). 
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Extended Data Figure 6 | Coarse-grained approach to modelling localized 
membrane constriction by CxC-Dyn1. a, Schematic representation of the 
geometry of the ring system used to produce local constriction of a prototype 
membrane tube. Two pairs of rings are shown, each formed by two closely 
juxtaposed rings (separated by a small distance Ax). The inner ring in each pair 
has the radius r and the outer ring has a slightly larger radius r+ Ar so that the 
ring pair promotes creation of an hourglass membrane shape. The PHDs of 
dynamin are represented as amphiphilic disks evenly distributed over the rings 


independent 
orientation 


with the centre of mass of each disk being restrained to a position on the ring 
(marked by blue and orange points). Two overlapping disks (purple and 
brown) attached to the right juxtaposed ring pair are shown. The orientations of 
the disks are not fixed, so the normal to the disk surface (purple and brown 
arrows) can have an arbitrary direction. b, Axial cross-section of a stable hemi- 
fission intermediate, the cylindrical micelle, created by the ring system 

shown in a. The rectangular box indicates the dimensions of the cylindrical 
micelles (the diameter D and the length L). 
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Extended Data Figure 7 | Molecular simulations of the hemi-fission and 
fission transformations. The red box shows a representative sequence of 
simulation snapshots (axial cross-sections) demonstrating the formation of the 
stable hemi-fission intermediate”. Radial constriction of a membrane tube 
resulted in reversible closure of the tube lumen, that is, flicker”®, followed by 
formation ofa stable cylindrical micelle structure. The blue box summarizes the 
simulation runs exploring the stability of the hemi-fission intermediate and its 
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rupture. The top part shows stable structures corresponding to the constrained 
intermediate (left, taken at zero tension) and the unconstrained intermediate 
(right, taken at 0.06 dyncm' tension). The bottom part shows the rupture 
of the intermediates by 0.6 dyn cm’ tension (left and right) or by elongation of 
the ring system at 0.06 dyncm * tension (middle). The characteristic times 
for the rupture are indicated near the corresponding blue arrows. 
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Extended Data Figure 8 | Dynamin-catalysed membrane fission occurs in _ flickering hemi-fission state (solid blue curve) through the assembly of small 


two mechanistically distinct stages through a hemi-fission intermediate. scaffolds and enhanced wedging activity of the PHD. However, without 
Model overlaying the distinct dynamin activities and conformational changes _ subsequent GTPase-driven conformational changes required to loosen the 
onto the two energy barriers (green curve) that must be overcome, first to scaffold, generate axial force and retract the PHD, as occurs for WT-Dyn1 
catalyse formation of the metastable hemi-fission intermediate and (dotted blue line), the membrane-bound CxC-Dyn1 creates an insurmountable 


subsequently to drive full fission. When trapped in the transition state and in _ barrier to fission (dashed blue line). 
the absence of GTP, CxC-Dyn1 can drive the formation of a metastable and 
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CDA directs metabolism of epigenetic nucleosides 
revealing a therapeutic window in cancer 


Melania Zauri', Georgina Berridge’, Marie-Laétitia Thézénas*, Kathryn M. Pugh”, Robert Goldin’, 


Benedikt M. Kessler* & Skirmantas Kriaucionis' 


Cells require nucleotides to support DNA replication and repair 
damaged DNA. In addition to de novo synthesis, cells recycle 
nucleotides from the DNA of dying cells or from cellular material 
ingested through the diet. Salvaged nucleosides come with the 
complication that they can contain epigenetic modifications. 
Because epigenetic inheritance of DNA methylation mainly relies 
on copying of the modification pattern from parental strands”, 
random incorporation of pre-modified bases during replication 
could have profound implications for epigenome fidelity and 
yield adverse cellular phenotypes. Although the salvage mecha- 
nism of 5-methyl-2’deoxycytidine (5mdC) has been investigated 
before**, it remains unknown how cells deal with the recently 
identified oxidized forms of 5mdC: 5-hydroxymethyl-2’ deoxycyti- 
dine (5hmdC), 5-formy-2’deoxycytidine (5fdC) and 5-carboxyl- 
2'deoxycytidine (5cadC)’"°. Here we show that enzymes of the 
nucleotide salvage pathway display substrate selectivity, effectively 
protecting newly synthesized DNA from the incorporation of epi- 
genetically modified forms of cytosine. Thus, cell lines and animals 
can tolerate high doses of these modified cytidines without any 
deleterious effects on physiology. Notably, by screening cancer cell 
lines for growth defects after exposure to 5hmdC, we unexpectedly 
identify a subset of cell lines in which 5hmdC or 5fdC administra- 
tion leads to cell lethality. Using genomic approaches, we show 
that the susceptible cell lines overexpress cytidine deaminase 
(CDA). CDA converts 5hmdC and 5fdC into variants of uridine 
that are incorporated into DNA, resulting in accumulation of DNA 
damage, and ultimately, cell death. Our observations extend cur- 
rent knowledge of the nucleotide salvage pathway by revealing the 
metabolism of oxidized epigenetic bases, and suggest a new thera- 
peutic option for cancers, such as pancreatic cancer, that have CDA 
overexpression and are resistant to treatment with other cytidine 
analogues". 

Modified cytidines can enter deoxynucleotide pools, because salvage 
and nutrient uptake pathways can recover nucleosides, rather than 
simpler degradation products such as uric acid in the salvage of 
purines’”. Previous biochemical work has suggested that 5mdC is 
not incorporated in the DNA, but is salvaged as thymidine*®. 
Salvage of oxidized 5-methylcytosine variants has not been previously 
characterized. We rationalized that, if nucleosides are recovered in 
unphosphorylated forms (through import) or monophosphate forms 
(through intracellular hydrolysis), the barrier restricting their incorp- 
oration into the DNA may lie in the nucleotide salvage enzymes or 
DNA polymerases. Providing cells with a final substrate for DNA 
polymerases, in the form of deoxynucleoside triphosphate, would 
allow decoupling of DNA synthesis from salvage enzyme activity. 
Therefore, we transfected two human cancer cell lines—MDA- 
MB-231 and H1299—with 5-hydroxymethyl-2’deoxycytidine tripho- 
sphate (ShmdCTP), isolated DNA and analysed the base composition 
by a high-performance liquid chromatography-ultraviolet (HPLC- 
UV) method, using a set of nucleoside standards for calibration 


(Fig. 1a). After ShmdCTP transfection, two additional nucleosides 
were observed in the hydrolysed DNA that correspond to 5hmdC 
and 5hmdU (Fig. 1b, c and Extended Data Fig. 1b). This indicates 
that DNA polymerases can incorporate 5hmdC into DNA, and 
also demonstrates strong deaminase activity acting on either the 
nucleotide or the incorporated base, resulting in the presence of 
5-hydroxymethyluracil (ShmUra) in the DNA. The capacity for 
DNA polymerases to use 5hmdCTP was also evident in an in vitro 
replication assay’’ (Fig. 1d), demonstrating that human DNA poly- 
merases are not selective against the incorporation of 5hmdC into 
DNA. Therefore, if salvage pathways can convert pre-existing sources 
of ShmdC into their nucleotide triphosphate forms, this could result in 
their incorporation into cellular DNA and potentially lead to deleteri- 
ous effects on the epigenome. 

The final triphosphate form of cytidine in a cell is produced by 
sequential phosphorylation by three classes of cytidine kinases. First, 
deoxycytidine kinase (DCK) produces a monophosphate, which is 
then converted into a diphosphate by cytidine monophosphate kinases 
(CMPK1 and CMPK2), and subsequently converted into a triphos- 
phate by the family of nucleoside diphosphate kinases'*. Because 
nucleoside diphosphate kinases phosphorylate both purine and pyr- 
imidine nucleosides!*, and CMPK2 is found in the mitochondria’®, we 
directed our efforts towards examining the substrate selectivity of DCK 
and CMPK1. Recombinant DCK was able to transfer the phosphate from 
ATP[y-*P] to 5mdC, 5hmdC and 5fdC, but not to 5cadC (Fig. le and 
Extended Data Fig. 1d), while CMPK1 phosphorylated only unmodified 
cytidine monophosphate (Fig. le). In agreement with previous work on 
5mdC (ref. 4), we can conclude that the inability of CMPK1 to create 
diphosphates of modified nucleotides provides the main barrier to the 
formation of respective dCTPs, limiting their availability for DNA poly- 
merases, which can instead accept modified dCTPs. 

Given this inherent selectivity of the nucleotide salvage pathway 
kinase CMPK1 for unmodified cytidine, we proposed that the intro- 
duction of abundant biologically modified cytidine variants would 
have little adverse effect on the physiology of a cell, unless they sig- 
nificantly impaired nucleotide metabolism. First, we determined that 
biological cytidine variants retain 70-100% of their original form 
after incubation in water and cell culture media for 10 days at 37 °C, 
while 80% of the synthetic variant 5-aza-2'deoxycytidine (5azadC) 
decomposed by day2 in agreement with previous observations” 
(Extended Data Fig. 2a-c). Next, a panel of 19 human cell lines was 
selected, sampling various tissue origins and p53 mutation statuses’ 
(Extended Data Fig. 2d). When cell growth media was supplemented 
with 10 1M 5hmdC or dC, most of the cell lines continued to prolif- 
erate at a normal rate. However, two cell lines (HOP-92 and MDA- 
MB-231) unexpectedly ceased to proliferate in the presence of 5amdC 
(Fig. 2a). We found that 10 1M 5hmdC was lethal and 1 1M 5hmdC 
caused mild growth inhibition (Fig. 2b). Interestingly, 5fdC was more 
potent at 1 and 10 UM doses in the MDA-MB-231 cell line, but showed 
the same cell line selectivity as 5amdC (Fig. 2b). 


1Ludwig Cancer Research, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7DQ, UK. Target Discovery Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7FZ, 
UK. 3Structural Genomics Consortium, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7DQ, UK. “Centre for Pathology, Imperial College, London W2 1NY, UK. 
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Genetic alterations or gene expression differences could modify the 
response of a cell line to biologically modified cytidine variants. By 
comparing the existing gene expression profiles of the cell lines (NCI60 
and CCLE projects’’”°) that we established as sensitive to modified 
cytidine variants to two randomly chosen resistant ones, we identified 
1,380 differentially expressed genes (P<0.01, >2-fold change). 
Notably, by focusing on differentially expressed genes known to be 
involved in nucleoside metabolism, we identified cytidine deaminase 
(CDA) overexpression in the 5hmdC-sensitive cell types, which had the 
ninth lowest P value of all the genes (Fig. 2c and Supplementary 
Table 1). None of the other known genes involved, either in nucleoside 


transport or cytidine recycling, were differentially expressed (Fig. 2c). 
To identify other cell lines with CDA overexpression, we ranked the 21 
available cell lines according to their CDA messenger RNA levels 
(Fig. 2d). SN12C and Capan-2 cell lines had the highest expression 
levels of CDA, and this was confirmed at the protein level by western 
blot (Fig. 2e). Examination of ShmdC and 5fdC tolerance revealed that 
a 10M dose substantially inhibited the growth of both cell lines, 
suggesting that the expression level of CDA is predictive of cytotoxicity 
for these epigenetic cytidine variants (Fig. 2e). 

To determine whether CDA overexpression is necessary for select- 
ive cytotoxicity, we manipulated CDA levels in the identified cell lines. 
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Cell lines (MDA-MB-231 and SN12C) with stable short hairpin RNA 
(shRNA) knockdown of CDA were able to survive 10 uM 5hmdC 
(Fig. 3a and Extended Data Fig. 2e). Furthermore, stable overexpres- 
sion of CDA in normally 5hmdC-resistant cell lines (H1299 and 
MCE-7) induced substantial growth inhibition (Fig. 3b and 
Extended Data Fig. 2f). These experiments clearly established that 
CDA overexpression is predictive, necessary and sufficient for cyto- 
toxic activity. In vitro measurements of recombinant CDA protein 
activity were performed with various cytidine variants. First, we deter- 
mined that CDA deaminates 5mdC, 5hmdC and 5fdC, but not 5cadC, 
creating thymidine and respective variants of uridine (Extended Data 
Fig. 2g, h). Second, reaction kinetic data fitted well with a pseudo zero- 
order kinetics model (R* > 0.9) revealing that, after deoxycytidine, 
the second best substrate (that is, with the second highest turnover 
number (k.at)) for CDA is 5fdC (Fig. 3c, d and Extended Data Fig. 2h). 
This was unexpected, because the catalytic activity does not follow a 
simple relationship with the dimensions of the 5’ modification as it 
does in the case of AID and APOBEC enzymes”. Molecular docking of 
cytidine variants to the CDA structure” suggested that 5fdC docks to 
the catalytic site with nearly 180° rotation when compared to unmodi- 
fied cytidine, retaining the amino group position close to the active 
site containing Zn** (Extended Data Fig. 3a). By contrast, 5amdC 
docks in the active site by displacing the amino group, which provides 
a potential explanation for the lower catalytic turnover observed 
(Extended Data Fig. 3a). 

The deamination of dC and 5mdC results in dU and T, which are 
the normal precursors for thymidine triphosphate synthesis. Con- 
versely, deamination of 5hmdC and 5fdC produces 5hmdU and 
5fdU, respectively, which are not canonical nucleosides. When phos- 
phorylated and incorporated into DNA, 5hmdU and 5fdU are toxic to 
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the cells (Extended Data Fig. 3b) as they are recognized as damaged 
bases and trigger extensive uracil glycosylase activity resulting in DNA 
breaks”’. Therefore, we asked whether the uptake of 5hmdC in CDA- 
overexpressing cells leads to its conversion into 5hmdU and to its 
incorporation into DNA, potentially explaining cell-type-specific leth- 
ality. First, we determined activities of thymidine kinase and thy- 
midylate kinase on ShmdU and 5fdU. In contrast to the inability of 
CMPK1 to act on equivalent cytidine variants, thymidine kinase and 
thymidylate kinase phosphorylated both uridine variants (Fig. 3e). 
Notably, the corresponding triphosphates are not substrates nor 
potent inhibitors of dUTPase, a robust enzyme that removes dUTP 
from cells (Extended Data Fig. 3c). Finally, analysis of the genomic 
DNA composition of 5hmdC- and 5fdC-treated MDA-MB-231 cells 
identified 5amUra and 5fUra, but no detectable change in 5-hydroxy- 
methylcytosine (ShmCyt) or 5-formylcytosine (5fdCyt) levels in the 
DNA (Fig. 3f, Extended Data Figs 3d, e and 4a-c). Overall, in all the cell 
lines examined, a linear correlation was observed between CDA 
expression and the amount of 5hmUra in the DNA after treatment 
with 5hmdC (Extended Data Fig. 4d). Signs of extensive DNA damage 
were detected by phosphorylated H2AX (yH2AX) staining in 5amdC- 
treated CDA-overexpressing cells (MDA-MB-231). By contrast, a cell 
line expressing low CDA levels (H1299) had no obvious yH2AX stain- 
ing (Fig. 3g and Extended Data Fig. 5c, d). Also, increased numbers 
of cells in S and G2 phases of the cell cycle were observed in CDA- 
overexpressing cell lines, consistent with cell cycle arrest triggered by a 
DNA damage response (Extended Data Fig. 5a). We did not observe 
deviations in the dNTP pools of treated cells, indicating that the cell 
death is likely to be caused by extensive base excision by SMUG] DNA 
glycosylase, which recognizes 5hmUra and 5fUra triggering repair and 
DNA double-stranded breaks (Extended Data Fig. 6). Together, these 


Figure 3 | Molecular mechanism of CDA- 
dependent cytotoxicity of cytidine variants. 
a, Western blot showing knockdown of CDA by 
shRNA (using sh-CDA) in the MDA-MB-231 cell 
line. Right panel illustrates growth curves of 
derived stable cell lines after treatment with 10 11M 
5hmdC (n = 3). (0) and (8) indicate two different 
shRNA constructs used for the experiments, and 
sh-luc denotes a non-targeting control shRNA that 
targets luciferase. WT, wild type. b, Western blot 
showing overexpression of CDA after lentiviral 
transduction of H1299 cells with a construct 
overexpressing CDA (CDA_dsRed). Right panel 
shows the growth curve after treatment with 10 uM 
5hmdC (n = 3). ¢, CDA activity fitted to the 
Michaelis-Menten model. Right panel shows a 
zoomed-in curve, when 5hmdC was used as a 
substrate. d, k.,; values of CDA supplied with 
cytidine variants. e, TLC separation of reaction 
products of thymidine kinase 1 (TK1) and 
thymidylate kinase (DTYMK), which were exposed 
Re - to different modified uridine substrates. xDP 
Ss indicates diphosphates; xMP, monophosphates. 
Treatments f, HPLC-UV chromatogram of nucleosides from 
DNA of MDA-MB-231 cells treated with 10 uM 
5hmdC or dC for 3 days. Right panel shows the 
abundance of 5hmdU relative to T (n = 3, t-test, 
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**P = 0.0057). g, YH2AX immunofluorescence in 
MDA-MB-231 and H1299 cell lines at day 3 after 
treatment with 10 4M 5hmdC or dC. Scale 

bars, 50 lum. Below are quantifications of cells 
showing positive signals (n = 3 , t-test, P = 0.0017). 
DAPI, 4’,6-diamidino-2-phenylindole. All error 
bars denote s.d. 


observations demonstrate that CDA deaminates 5hmdC and 5fdC, 
creating 5hmdU and 5fdU, respectively, which are incorporated into 
the DNA, leading to cell cycle arrest and eventually death. 

CDA overexpression has been linked to resistance to cytidine 
analogues—such as gemcitabine, cytosine arabinoside or 5-azacytidine— 
that are currently used in cancer treatment, presenting a major obstacle 
to their use’’***°. Our observations about biological nucleoside var- 
iants demonstrate an opposite effect: CDA overexpression sensitizes 
cells to otherwise non-toxic 5hmdC and 5fdC. Because cancers origin- 
ating in the pancreas’, stomach, testis and vagina have upregulated 
CDA expression”® (Extended Data Fig. 7a, b), we postulated that the 
administration of 5hmdC and 5fdC could have a selective activity 
against these tumour cells. We first tested whether cytotoxic activity 
is cell autonomous for CDA-overexpressing H1299 cells in the pres- 
ence of wild-type (CDA-low) H1299 cells. Both 5hmdC and 5fdC were 
able to eliminate CDA-overexpressing cells selectively, suggesting that 
secreted CDA or 5hmdU is insufficient for cytotoxicity (Fig. 4a). 
Tolerance to and the stability of SamdC and 5fdC in vivo was deter- 
mined in immunocompromized BALB/cOlaHsd-Foxn1"”"" mice 
after they received a range of doses (12.5 to 100 mgkg_') of ShmdC 
and 5fdC by intraperitoneal injection. Half an hour after injection, 
we were able to detect 5hmdC and 5fdC in the bloodstream, and to 
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quantify 5hmdC using RapidFire mass spectrometry (Extended Data 
Fig. 7c, d). We observed no adverse effects on behaviour, injection site, 
weight or histology in the panel of tissues studied, even though some 
tissues (kidney and intestine) express CDA (Extended Data Fig. 7e-h, 
data not shown). To determine whether cytidine variants have an effect 
on tumour growth in proliferating cells, we subcutaneously injected 
H1299 wild-type and CDA-overexpressing cells into each side of an 
animal, which was later treated with 5hmdC or 5fdC (Fig. 4b). 
Xenografts with CDA overexpression grew slightly slower (reaching 
64% of wild-type tumour volume), and the volume of the tumour 
was further reduced twofold in animals treated with 5hmdC or 5fdC 
(Fig. 4c). CDA-overexpressing tumours showed a twofold decrease in 
the number of proliferating cells and a threefold increase in the number 
of cells with DNA damage in 5fdC-injected animals, but smaller dif- 
ferences in animals that were injected with 5hmdC (Fig. 4d). Similar 
CDA-dependent effects on tumour volume and proliferation were 
observed when SN12C wild-type and SN12C CDA knockdown cells 
were used in the xenograft assay (Extended Data Fig. 8). 

Here, we have characterized the metabolism of newly discovered 
biologically modified nucleosides, leading to a model in which the 
selectivity of CMPK1 prevents random incorporation of modified 
cytosines (Fig. 4e). Notably, we have discovered that 5hmdC and 


Figure 4 | In vivo evaluation of cytidine variants 
and the proposed model of epigenetic nucleoside 
variants in the nucleoside recycling pathway. 

a, Wild-type and CDA-overexpressing H1299 cells 
were mixed at equal ratios and exposed to the 
indicated variants of cytidine. Representative 
histogram (left) and quantification of the results 
(right) are shown (n = 3, 10,000 events recorded). 
Lower concentrations of 5fdC were used to demon- 
strate higher cytotoxic potency. b, Schematic 
illustration of xenograft establishment and 
treatment with nucleoside variants. D, days. 

c, Volume of tumours, calculated by assuming 
that tumours were spheres with their diameters 
measured using Vernier calipers (n = 8 in 5fdC and 
n=7 in 5hmdC experiments, two-way analysis 
of variance (ANOVA) with repeated measures 
Holm-Sidak correction, P< 0.0001). Dissected 
tumours are illustrated below. d, Evaluation of 
proliferation (immunofluorescence, H3PS10) and 
DNA damage (immunofluorescence, YH2AX) in 
dissected tumour samples. Scale bar, 50 um (n = 4, 
one-way ANOVA, H3PS10: **P = 0.0057, 
yH2AX: *P = 0.0491 (ShmdC versus PBS), 

***D — 0.0001 (5fdC versus PBS)). Error bars 
denote s.d. e, Model of metabolism of epigenetic 
nucleoside variants. 
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5fdC, but not 5cadC, are deaminated by CDA at different rates, result- 
ing in the formation of cytotoxic 5hmdU and 5fdU. Our data on 
oxidized epigenetic bases are similar to the proposed mechanism of 
5mdC salvage, in which CMPK1 is rate-limiting in the production of 
the diphosphate, whereas 5mdC deamination produces a normal 
T (refs 4-6). We did not observe any adverse effects during the 
administration of 5amdC and 5fdC in mice, presumably because the 
cytotoxic threshold is only reached in highly proliferating and CDA- 
overexpressing cells, in which there is substantial incorporation of 
nucleoside variants in the DNA, reflected by the CDA-dependent 
regression of xenografts. Together with recent publications dem- 
onstrating the importance of and therapeutic opportunities targeting 
MTH1, which surveys damaged nucleosides”’*’, our data extend the 
current understanding of the metabolism of biological cytidine var- 
iants and provide a novel avenue for cancer therapy. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Purification of DCK, CMPK1, CDA, TMPK and DUT. Human DCK with a 
carboxy-terminal 6XHis tag was cloned in pET28a(+) and expressed in 
Escherichia coli BL21 RIPL (Life Technologies) for 4h at 37 °C following induction 
with 1mM isopropyl B-p-1-thiogalactopyranoside (IPTG) in LB. The bacterial 
pellet was resuspended in 50 mM sodium phosphate, pH 8, 300mM NaCl and 
protease inhibitors (Complete EDTA-free, Roche). The protein was bound to a Hi- 
TRAP HP 5ml column (GE Healthcare) and eluted with a linear gradient of 
0-500 mM imidazole in the lysis buffer, supplemented with 10% glycerol. The 
fractions were assessed by electrophoresis and ones containing the protein were 
pooled, concentrated with Amicon 3-kDa centrifugal filter units (Millipore) and 
separated on a HiPrep 16/60 Sephacryl S-200 gel filtration column (GE 
Healthcare). The protein was again concentrated using Amicon columns, supple- 
mented with final 10 mM dithiothreitol (DTT) and 40% of glycerol, snap-frozen 
and stored in aliquots at —-80°C. Human CMPK1 was tagged at the C terminus 
with 6X His and purified using a similar workflow to DCK with the following 
exceptions: the lysis buffer was 50 mM Tris, pH 7.5, 10 mM NaCl and protease 
inhibitors (Complete Mini, Roche); after the gel-filtration step the protein was 
bound to an anion exchange column HiTrap Q HP 5ml (GE Healthcare) and 
eluted with a 20-column-volumes linear gradient of 0-1_M NaCl. The salt was 
removed by dialysis in 50 mM Tris, pH 8, the protein concentrated and 10 mM 
DTT added to the final preparation, before storage in 40% glycerol at —80 °C. 
C-terminal 6 x His-tagged CDA was purified in a similar workflow to CMPK1 with 
the following modifications: the protein was expressed for 19 h at 37 °C; the lysis 
buffer was 50 mM Tris, pH 7.5, 1 mM DTT, 1 mM EDTA and protease inhibitors 
(Complete Mini, Roche); following HiTRAP purification, 6X His tag was cleaved 
by thrombin (Sigma); cleaved tag and uncleaved protein was removed by separa- 
tion using a HiTRAP HP 5ml column and collection of the flow-through. 
Subsequently, the protein was purified using gel filtration as indicated above 
and stored in aliquots at —80 °C. Protein purity was assessed by electrophoresis 
and CDA was additionally identified by mass spectrometry. Human TMPK1 with 
a C-terminal 6X His tag was cloned in pET28a(+) and expressed in E. coli BL21 
RIPL (Life Technologies) for 4h at 37 °C following induction with 1 mM IPTG in 
LB. The bacterial pellet was resuspended in 50mM sodium phosphate, pH 7, 
300 mM NaCl and protease inhibitors (Complete EDTA-free, Roche) and lysed 
with a French press (EmulsiFlex C5, Avestin) at ~100 MPa equipped with a 
recirculating cooler (F250, Julabo) set at 4°C. The protein was bound to a Hi- 
TRAP HP 5ml column (GE Healthcare) and eluted with a linear gradient of 
0-500 mM imidazole in the lysis buffer, supplemented with 10% glycerol. The 
fractions were assessed by electrophoresis and ones containing the protein were 
pooled, concentrated with Amicon 3 kDa centrifugal filter units (Millipore), sup- 
plemented with 40% of glycerol, snap frozen in aliquots and stored at —80 °C. 
Human DUT with a C-terminal 6XHis tag was cloned in pET28a(+) and 
expressed in E. coli BL21 RIPL (Life Technologies) for 4h at 37°C following 
induction with 0.2mM IPTG in LB. The bacterial pellet was resuspended in 
20 mM sodium phosphate pH 7.3, 150 mM NaCl, 1% Triton X-100 and protease 
inhibitors (Complete EDTA-free, Roche) and lysed with a French press 
(EmulsiFlex C5, Avestin) at ~100MPa equipped with a recirculating cooler 
(F250, Julabo) set at 4°C. The protein was bound to a Hi-TRAP HP 5 ml column 
(GE Healthcare) and eluted with a linear gradient of 0-500 mM imidazole in the 
lysis buffer, supplemented with 10% glycerol. The fractions were assessed by 
electrophoresis and ones containing the protein were pooled, concentrated with 
Amicon 3kDa centrifugal filter units (Millipore), supplemented with 40% of 
glycerol, snap frozen and stored in aliquots at —80°C. Thymidine kinase was 
purchased and the purity assessed by SDS-PAGE (8180-TK-050, R&D Systems). 
Nucleoside stability. Nucleosides were obtained from the following sources: 
5hmdC (PY-7588, Berry & Associates), 5fdC (PY-7589, Berry & Associates), 
5cadC (PY-7593, Berry & Associates), 5azadC (A3656, Sigma Aldrich), ATP 
solution (Thermo Fisher), [y-°*P]ATP (Perkin Elmer), dC (Sigma Aldrich, 
D3897), dCMP (Sigma Aldrich, D7625), 5hmdCTP (Bioline, BIO-39046). 
100 LM solutions of 5hmdC, 5fdC and 5azadC were prepared in HPLC-grade water 
(Thermo Fisher) or in DMEM (Lonza). The solutions were incubated at 37 °C for 
10 days. A sample was taken every 24h and subjected to HPLC-UV analysis. 
Enzyme assays. The substrate selectivity of DCK and CMPKI1 kinases were 
measured by **P transfer and detection using 1D or 2D TLC. 1g of DCK 
was incubated in 100mM Tris, pH7.5, 100mM KCl, 10mM MgCh, 1mM 
[y-**P]ATP and 200 uM of the respective nucleoside in a 50 pl reaction volume 
at 37°C for 2h. 1l of products was separated via 2D TLC on glass-backed 
AVICEL cellulose plates (Analtech) as described*’. CMPK1 was assayed through 
a coupled assay with DCK following the conditions described previously” with 
1 pg DCK, 1 pg CMPK1 and 1 mM substrate. Thymidine kinase (8180-TK-050, 
R&D Systems) and TMPK1 were assayed through a coupled assay with 1 pg 
thymidine kinase, 1 4g TMPK1 and 1mM substrate in 50mM Tris, pH7.4, 


LETTER 


50mM KCI, 5mM MgCl, 1mM ATP and 2.511Ci [y-*’PJATP at 37°C. 1D 
TLC was performed using glass-backed TLC sheets (PEI cellulose F, Millipore) 
as described previously’. The plates were exposed to a storage phosphor screen 
(GE Heathcare), which was scanned using Phosphoimager (Biorad) and images 
analysed with ImageLab software (Biorad). CDA kinetic activity data was collected 
as described’’ by monitoring the absorbance at 260 nm with a spectrophotometer 
(SpectraMax M2, Molecular devices) using 45 ng of enzyme (500 ng for ShmdC) 
and the data fitted according to pseudo zero order Michaelis-Menten enzyme 
kinetic model by Prism software (GraphPad). 1 1g DUT was assayed in 50 mM 
Tris, pH7.5, 4mM MgCh, 1mM DTT, 0.1 mg ml ! BSA with 5 pM of substrate 
in 40ul reaction volume for 10min at 37°C. The generated pyrophosphate 
was detected with a bioluminescent coupled assay (PPiLight inorganic pyropho- 
sphate assay LT07-500, Lonza). The plate was then read in a GloMax instrument 
(Promega). 

Molecular docking. A tetramer was generated with CDA structure 1MQ0 (ref. 22) 
and subject to DockPrep in Chimera 1.8 (http://www.cgl.ucsf.edu/chimera). 
Substrates were dC (ZINC18286013)"*, ShmdC (ZINC77300654)"* and 5fdC 
(CSID:10291642) (http://www.chemspider.com) downloaded as .mol files and 
subjected to .mol2 files conversion in Chimera. Docking was subsequently per- 
formed with SwissDock (http://www.swissdock.ch/docking)**. The model with the 
lowest AG of ligand was then visualized and analysed with Chimera. 

In vitro replication assay. The assay was carried out following protocols for 
nuclear extract and cytoplasmic fraction preparation and for the replication 
assay’*"°. The reaction contained 0.3mM of each canonical nucleotide, except 
dCTP which was substituted by 5hmdCTP. The reaction was stopped with the 
addition of 0.1M EDTA final. DNA was extracted with phenol and chloroform, 
treated with RNase A/T1 (Thermo Fisher) and free nucleotides removed with a 
Mini Quick Spin DNA column (Roche) before HPLC assay. 

DNA glycosylase assay. The single stranded DNA oligonucleotide substrates (5'- 
FAM CATAAAGTGXAAAGCCTGGA, in which X denotes uracil, 5hmUra or 
5fUra) were purchased from AtdBio and their complementary strand from IDT 
(all HPLC purified). Recombinant human SMUGI1 (NEB) was incubated with 
annealed oligonucleotides as described before”. The reaction products were 
resolved on a 15% denaturing polyacrylamide TBE-urea gel (Invitrogen) and 
quantified using ChemiDoc (BioRad) with blot detection protocol for Alexa 488. 
Quantification of nucleosides by HPLC. Genomic DNA was extracted with 
Gene Jet Genomic DNA extraction Kit (Thermo Fisher) or TRI Reagent (Sigma 
Aldrich), incubated with RNase A/T1 (Thermo Fisher) in buffer 2 (NEB), phenol/ 
chloroform extracted and precipitated with ethanol. 1-10 ug of DNA was hydro- 
lysed as described before’. Nucleosides were resolved with an Agilent UHPLC 
1290 instrument fitted with Eclipse Plus C18 RRHD 1.8 km, 2.1 X 150mm col- 
umn and detected with Agilent 1290 DAD fitted with a Max-Light 60 mm cell. 
Buffer A was 100 mM ammonium acetate, pH 6.5; buffer B was 40% acetonitrile, 
and the flow rate 0.4mlmin™'. The gradient was between 1.8-100% of 40% 
acetonitrile with the following steps: 1-2 min, 100% A; 2-16 min 98.2% A, 1.8% B; 
16-18 min 70% A, 30% B; 18-20 min 50% A, 50% B; 20-21.5 min 25% A, 75% B; 
21.5-24.5 min 100% B. 

Quantification of nucleotides by HPLC. MDA-MB-231 and H1299 cells were 
treated with 10 uM dC, 10 4M 5hmdC and 1 1M 5fdC. Metabolites were extracted 
at day 3 as described before”. In brief, cells were washed in PBS and scraped on ice. 
The pellet was washed again in cold PBS and extraction was done with 50 kl of ice- 
cold 50% ACN per mg of pellet. The samples were vortexed and incubated on ice 
for 10 min. Insoluble material was pelleted at 20,000g for 10 min and supernatants 
were dried using a SpeedVac (Thermo Scientific). Metabolites were dissolved in 
30 pl of buffer A and 20 il was used for chromatography. HPLC was performed as 
described** with some minor modifications as listed below. Nucleotides were 
resolved with an Agilent UHPLC 1290 instrument fitted with Eclipse Plus C18 
RRHD 1.8 im, 2.1 X 150 mm column and detected with Agilent 1290 DAD fitted 
with a Max-Light 60 mm cell at 254, 260 and 280 nm. Buffer A consisted of 100 mM 
KH>PO, (60221, Sigma) with 8 mM tetrabutylammoniumbisulfate (98479, Sigma) 
set at pH 5.5. Buffer B consisted of buffer A with 25% methanol. After 8 min at 0% 
buffer B, the gradient started with a linear increase of buffer B to 35% in 19 min, 
followed by a linear increase from 35% to 38% buffer B in 5 min and from 38% to 
100% buffer B in 22 min. After an 8-min hold at 100% buffer B, the gradient was 
reversed from 100% to 0% buffer B in 2 min, followed by a hold at 0% buffer B for 
2 min. The column temperature was set at 30 °C and the flow rate was 0.4 ml min’. 
The compounds were identified by comparing their retention times and their UV 
spectra with those of known standards, which were purchased from Sigma-Aldrich. 
The integrated area was used to quantify the relative abundance of nucleotides by 
normalizing each peak area to the ADP area as an indication of loaded amount. 
Cell culture and transfections. Cell lines were routinely tested for mycoplasma 
contamination using Lonza Mycoalert kit. Cell proliferation assays were done by 
seeding cells in p60 plates or in a T25 flask with appropriate concentrations of 
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5hmdC, 5fdC or dC in the growth media. The cells were passaged, counted and the 
media was replaced every 2 days. Before counting, 1 volume of Trypan blue solu- 
tion (Lonza) was added to an aliquot of single cell suspension. The live cells were 
counted by TC-20 Cell Counter (Bio-Rad). NTPs were introduced by nucleofec- 
tion. One million MDA-MB-231 cells were nucleofected with 50 mM 5hmdC ina 
100-1 volume using an Amaxa nucleofector kit (Lonza), following the manufac- 
turer’s instructions. After transfection, cells were seeded in a 6-well plate, 24h later 
washed twice with PBS, and 48h later DNA extracted for HPLC analysis. 
Production of stable cell lines. Stable cell lines were generated via lentiviral 
infection using a standard protocol”? with second generation packaging plasmids 
(pCMV-VSVG, pCMV-dR8.9, a gift from B. Amati). CDA knockdown was 
achieved by infecting MDA-MB-231 and SN12C cell lines with pLKO.1 vectors 
containing five different shRNA constructs (SHCLND-NM_001785, Sigma- 
Aldrich) and a control pLKO.1 containing shRNA silencing luciferase (a gift from 
X. Lu). Infected cells were selected by incubation with 1.5 gml~' puromycin 
(Sigma) for 60h. Two cell lines with the lowest CDA mRNA levels (shRNA 
TRCN0000051290 and TRCN0000051288, designated (0) and (8), respectively) 
were further assessed by immunoblotting and used for experiments. Lentivirus for 
CDA overexpression was generated with pLenti-puro (39481, Addgene, I.-M. Shih 
laboratory) expressing dsRed-IRES-CDA. H1299 and MCE-7 were infected as 
above. Infected cells were selected with puromycin at 2 1g ml * for 60h. 
Immunoblotting, FACS and immunofluorescence. For western blot analysis, 
10° cells were lysed with RIPA buffer (20 mM HEPES at pH 7.5, 300 mM NaCl, 
5mM EDTA, 10% glycerol, 1% Triton X-100, supplemented with protease 
inhibitors (Complete EDTA-free, Roche)) and sonicated. Cleared lysates were 
electrophoresed and immunoblotted with the following primary antibodies: 
anti-CDA (Sigma, SAB1300717 1:250), anti-actin (Abcam, ab185058 1:75000). 
Chemiluminescent detection, after incubation of the membranes with appropriate 
secondary antibodies, was done through a CCD camera using the ChemiDoc 
System (Bio-Rad) with Image Lab software (Bio-Rad, version 4.0). For FACS 
analysis, 5 X 10° cells were trypsinized, washed in PBS and fixed in 70% ethanol 
for 1h on ice. The pelleted cells were resuspended in 250 ul of staining solution 
(504g ml’ propidium iodide (P4864, Sigma), 0.1 mgml~* RNaseA and 0.05% 
Triton X-100) and incubated at 37°C for 40 min. Controls were used for G1 
(serum starvation overnight) and G2 (0.1 pg ul? nocodazole overnight). 
Fluorescence of 10,000 cells was recorded with a FACS Canto flow cytometer 
(BD Biosciences) and analysed using FlowJo software (Version 7.6.5, TreeStar). 
For immunofluorescence, cells were grown on coverslips and fixed with 4% para- 
formaldehyde for 20 min at room temperature. Cells were washed twice in PBS 
and permeabilized for 10 min in 0.2% Triton X-100. After two washes in PBS, cells 
were blocked for 1h in 3% BSA (Sigma Aldrich), dissolved in PBS and incubated 
with yH2A.X antibody (Millipore, 05-636, 1:500) overnight at 4 °C in a humidified 
chamber. Cells were then washed three times in PBS and incubated with anti- 
mouse secondary antibody conjugated with Alexa546 (1:400, Life Technologies) 
and DAPI (Sigma Aldrich). Coverslips were then washed three times in PBS and 
mounted with mounting media (Vectashield). Tiled pictures were automatically 
taken with a Zeiss 710 microscope with a 20X lens. The amount of nuclear 
fluorescence was quantified using ImageJ. 

Gene expression analysis and public data sets. Data sets used in the study: Gene 
Expression Omnibus (GEO) accessions GSE36139 (GPL15308)”° and GSE32474 
(GPL570)*°. Gene expression analysis comparing was done on the data from 
the NCI-60 panel’’ as follows. Affy HG-U133 Plus 2.0 microarray data was 
downloaded from CellMiner database (http://discover.nci.nih.gov/cellminer/ 
loadDownload.do) and cel files were extracted for triplicate experiments done 
on BR:MCF7, ME:MDA_MB_ 435, BR:MDA_MB_231 and LC:HOP_92 cell lines. 
Data was then imported into ArrayStar v11 (DNAStar) and signal normalization 
and intensity correction was done using RMA Quantile method. Experiment was 
designed by grouping BR:MCF7, ME:MDA_MB_435 cell lines into ‘resistant’ 
group and BR:MDA_MB_231 and LC:HOP_92 cell lines into ‘sensitive’ group. 
Differential expression between the groups was determined using the Student’s 
t-test with Benjamini-Hochberg multiple testing correction. Genes were called as 
differentially expressed when P< 0.01 and fold change >2. The full data set is 
included in Supplementary Table 1. To derive CDA expression values in tumours, 
GPL15308 (ref. 20) and GPL570 (ref. 40) were analysed directly on the NCBI 
portal with GEO2R. P values were adjusted with Benjamini-Hochberg correction. 
Toxicology and dose determination in animal experiments. Animal work was 
done after approval by the UK Home Office and University of Oxford Local 
Ethical review. Three 5-7-week-old BALB/cOlaHsd-Foxn1"™" (Harlan) mice 
per dose were injected (intraperitoneally) with 25, 50 and 100mgkg™' of 
5hmdC and 12.5, 25, 50 and 100 mgkg ? of 5fdC. Animals were monitored for 
any deviations from normal behaviour. At 30 min post-injection, a few drops of 
blood were collected through tail vein bleeding using Microvette CB300 (Sarstedt) 
to assess the amounts of the compounds in the bloodstream. 


RapidFire mass spectrometry analysis of serum samples. Serum was isolated by 
centrifugation of Microvettes according to the recommendations of the manufac- 
turer (Sarstedt). The samples were brought up to 200 ul with water and three 
volumes of methanol, and 150 ul of chloroform was added. After intense vortex- 
ing, 450 pl of water was added, samples were vortexed again and centrifuged at 
14,000g for 1 min. The aqueous phase containing the soluble molecules was col- 
lected and dried in a Speedvac (Thermo Scientific). The dried pellets were then 
resuspended in 10 1 water, then 3 11 diluted further into 50 ul of water to load ona 
RapidFire 360 high throughput sample delivery system coupled to a 6530 quad- 
rupole time-of-flight (QTOF) mass spectrometer (Agilent). The samples were 
aspirated by vacuum at —40 bar for 400 ms into a 10-11 sample loop and loaded 
onto a graphitized carbon solid phase extraction cartridge running buffer 5 mM 
ammonium formate at a flow of 1.5mlmin '. The matrix components not 
retained on the cartridge were diverted to waste for 4,500 ms, and the retained 
components eluted with 95% acetonitrile 5 mM ammonium formate for 4,500 ms 
at a flow of 1 ml min '. The SPE was then re-equilibrated for 4,500 ms with 5 mM 
ammonium formate. Data were collected in positive electrospray ionisation (ESI) 
mode using a 2 Gb data configuration, gas temperature 300 °C, drying gas 81 min“, 
nebuliser gas ~240 MPa, Vcap 3,500 V and fragmentor voltage 175 V. The amount 
of nucleoside was measured against a standard curve produced by dissolving known 
amounts of 5hmdC and 5fdC in serum and processed as indicated above. Data were 
analysed using an Agilent Mass Hunter Qualitative (vB.06) and Quantitative (vB.05) 
analysis software. Standard curve analysis was determined by using a quadratic curve 
fit algorithm for each nucleoside with an R* > 0.98 in all instances. 

Nucleoside analysis by mass spectrometry (HPLC-QTOEF). Samples were dried 
in a speed-vac and re-suspended in 10 ul of water. For the analysis by HPLC- 
QTOF mass spectrometry, a 1290 Infinity UHPLC was fitted with a BEH C18 XP 
Column, (130A, 1.7 um, 2.1mm X 150 mm; Waters) and coupled to a 6560 Ion 
mobility QTOF LC/MS mass spectrometer (Agilent Technologies) equipped with a 
Jetstream ESI-AJS source. The data were acquired in QToF mode using positive 
electrospray ionisation (ESI+). Two reference ions, m/z 121.0508 and 922.0097 
were used as internal standards. The Dual AJS ESI settings were as follows: gas 
temperature: 150 °C, the drying gas: 51 min~ | nebulizer 240 MPa, sheath gas tem- 
perature 360 °C, sheath gas flow 121 min — - Vcap 4,000 V and nozzle voltage 300 V. 
The fragmentor of the mass spectrometer TOF was set to 275 V. 

The gradient used to elute the nucleosides started by a 1-min isocratic gradient 
composed with 99.5% buffer A (10 mM ammonium acetate, pH 6) and 0.5% buffer B 
(composed of 40% CH3CN) with a flow rate of 0.350 ml min and was followed by 
the subsequent steps: 1-2 min, 98.2% A; 2-16 min 80% A; 16-18 min 50% A; 18- 
20 min 25% A; 20.20-21.5 min 0% A; 21.5-22.5 min 100% B; 22.5-24.5 min 99.5% B. 
The gradient was followed by a 5 min post time to re-equilibrate the column. 

The raw mass spectrometry data was analysed using the MassHunter Qual 
Software package (Agilent Technologies, version B7.0), and the masses/retention 
times used for the characterization of nucleosides and their adducts are summar- 
ized in Supplementary Table 2. For the identification of compounds, raw mass 
spectrometry data were processed using the molecular feature extraction function 
in the MassHunter software, followed by metabolite searching through mass/ 
isotope matching using the PCDL software (version B.07.00 build 7024.0) and 
the METLIN database (https://metlin.scripps.edu/index.php). For each nucleos- 
ide, precursor ions corresponding to the M+H, M+Na, M+K, 2M and base only 
species were extracted, and the most intense ion species observed for each nuc- 
leoside was used for quantification. Identities of peaks eluting at 4.5 and 5.1 min 
(Figs 1b and 3f) are shown in Extended Data Figs 9 and 10. 

Subcutaneus xenografts. Animal work was done after approval by the UK Home 
Office and University of Oxford Local Ethical review. Power calculations sug- 
gested 9-6 animals per group if we were to observe a significant 50% difference 
in tumour size with power of 90% and s.d. between 40 and 30%. One million cells 
in a 50% suspension of MatriGel (200 il) were injected into 5-7-week-old BALB/ 
cOlaHsd-Foxn1"”"" (Harlan) mice, 8 animals per group in each flank following 
the scheme: SN12C/H1299 left, SN12C shCDA8/H1299 dsRedCDA right. When 
the tumours reached palpable size, 8 mice were assigned randomly to each treat- 
ment group: PBS, 100 mg kg’ of ShmdC and 100 mg kg 5fdC. The compounds 
were administered every 72h (four doses in total). Tumour size was measured 
every 3 days by Vernier caliper and the animal cohort euthanized when the cumu- 
lative tumour diameter in the first animal reached 12 mm. The experimenter was 
unaware of the cell line genotypes during the measurements. Tumour volume was 
calculated assuming that the tumours were spheres with the following formula: 
4/3n (D/2)°, in which D represents the diameter of the tumour. 

Histology. Organs and tumours were collected and immediately fixed in 10% 
formalin for 48h. They were then embedded in wax and 4-|1m thick sections cut. 
All sections were stained with H&E. Tumours were additionally stained with a 
Masson’s Trichrome Stain Kit (Sigma Aldrich) according to the manufacturer 
instructions. 
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Immunofluorescence of tissues and tumours. The 4-11m thick sections were 
subjected to antigen retrieval with a pressure cooker in Tris buffer, pH9 
(10 mM Tris base, 0.05% Tween 20). They were then blocked in 3% BSA in PBS 
for 30 min and incubated overnight in a humidified chamber at 4°C with the 
following antibodies: yH2A.X (Millipore, 05-636, 1:200) and PH3 (Millipore, 
06-570, 1:200) or B-catenin (BD Transduction Laboratories, 610153, 1:250) and 
CDA (Sigma Aldrich, SAB1300717, 1:100). The slides were then washed vigor- 
ously three times in PBS and incubated for 1h at room temperature with an 
appropriate secondary antibody, Alexa546 and Alexa488 conjugated (1:400, Life 
Technologies) and DAPI (Sigma Aldrich). Coverslips were then washed three 
times in PBS and mounted with mounting media (Vectashield). Images were 
acquired with a Zeiss 710 confocal microscope with a X20 objective. For quan- 
tification of DNA damage and proliferation in tumours, tiled images with Z stacks 
were acquired to cover the entire central section of the tumour. Image J was used to 
quantify the immunofluorescence signal. 
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Extended Data Figure 1 | DNA polymerase and nucleoside kinase activities  d, Two-dimensional TLC images of DCK reaction products. Dotted lines 
on modified nucleosides. a, Mass spectrometry confirmation of 5amdC, 5fdC indicate reference points, which aid in tracking the migration localization of 
and 5cadC in the purchased nucleosides. b, HPLC-UV chromatogram of the nucleosides. The monophosphate in each reaction is circled in red 
nucleosides from DNA extracted from H1299 cells transfected with 5hmdCTP. (representative picture, n = 3). e, Schematic map of nucleoside migration on 
The abundance of 5hmdC relative to dG is illustrated in the right panel (n = 3). _ two-dimensional TLC plate (asterisk indicates a background spot coming 
n.d., not detected. Error bars denote s.d. c, Coomassie-stained SDS-PAGE from ATP and used as a reference point) 

gel of recombinant purified DCK and CMPK1 enzymes used in the study. 
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Extended Data Figure 2 | Stability of the nucleosides and CDA activity. 

a, b, Quantification of nucleosides by HPLC-UV during 10 days of incubation 
in water (a) and DMEM (b) at 37 °C (n = 3). c, Representative HPLC-UV 
chromatograms at days 0, 2 and 10 with retention times indicated above each 
peak. d, Cell lines used in the study and their characteristics. e, Western blot 
showing knockdown of CDA by shRNA in the SN12C cell line. Right panel 
illustrates the growth of the cell line during treatment with 10 uM 5hmdC 

(n = 3). f, Western blot showing expression of CDA in wild-type and 


LETTER 


lentivirally transduced MCEF7 cell line. Growth curve after treatment with 

10 LM 5hmdC is shown on the right (n = 3). g, Coomassie-stained SDS-PAGE 
gel of recombinant purified CDA enzyme used in this study. h, HPLC-UV 
chromatograms showing the retention times and identity of substrates and 
CDA-catalysed products. i, List of Km; Kcat and Vmax Values of catalytic activity 
of CDA catalysing the deamination of cytidine variants. All error bars 

denote s.d. 
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Extended Data Figure 3 | Mechanism of CDA catalysed deamination of demonstrating recombinant purified DUT (molecular mass, 18 kDa) and 
epigenetic nucleosides, their cytotoxicity and dUTPase activity. a, Molecular _ in vitro measurements of dUTPase activity using non-canonical uridine 
docking of dC, 5hmdC and 5fdC on the CDA active site (Protein Data Bank triphosphates (n = 3). d, Extracted ion chromatogram of nucleoside standards 
(PDB) accession 1MQ0). The detailed view of the catalytic pocket is shown analysed by HPLC-QTOF mass spectrometry. Each nucleoside intensity was 
with the modified nucleoside in the centre. Chains A, B and C indicate units of | measured using the merged m/z values of the [M+H] + [M+Na]*, [M+H]", 
the tetramer, which CDA forms to deaminate four nucleosides. Thin yellow [2M+H]" and [base+H]* anda symmetric single m/z expansion of +0.02. 
lines show compatible distances for the formation of hydrogen bonds. e, The most prominent ion of 5hmdU was identified in 5hmdC-treated 

b, Growth curves of H1299 and MCF7 cell lines treated with 10 and1 uM ofdC, © MDA-MB-231 cells. All error bars denote s.d. 

5hmdU and 5fdU over a period of 10 days (n = 3). c, Coomassie-stained gel 
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Extended Data Figure 4 | Mass spectrometry identification of 5fUra and 
ultraviolet quantification of 5hmdU in the DNA. a, Extracted ion 
chromatogram of nucleoside standards with 5fdU analysed by HPLC-QTOF 
mass spectrometry (as in Extended Data Fig. 3d). b, Weak, but consistent signal 
of 5fUra is identified in DNA of 5fdC-treated MDA-MB-23]1 cells, but not 
dC-treated cells or buffer alone. Two representative examples are shown. 
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c, Relative quantification of 5fUra signal from three biological mass 
spectrometry replicates. d, Relationship between measured 5hmdU/T in the 
DNA of cell lines treated with 10 1M 5hmdC for 3 days and CDA expression 
levels. The cell lines used in this study are in coloured font (n = 3). All error 
bars denote s.d. 
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Extended Data Figure 5 | Effect of 5amdC administration on the cell cycle 
and DNA damage. a, b, Propidium iodide FACS assay of the cell cycle. Shown 
are two representative plots of MDA-MB-231 cells at day 3 of treatment 

with dC and 5hmdC (10 1M) (a) and quantification for all the cell lines 
analysed (n = 3) (b). Two-way ANOVA: P = 0.0027 (S: 5hmdC versus dC 
MDA-MB-231), P = 0.0149 (G2-M: 5hmdC versus dC MDA-MB-231). 
HOP-92 P< 0.0001, P = 0.0005 (S: 5hmdC versus dC Capan-2), P< 0.0001 
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(G2-M: 5hmdC versus dC Capan-2) (n = 3; 10,000 events acquired). c, YH2AX 
immunofluorescence in MDA-MB-231 and H1299 cell lines at day 3 after 
treatment with 10 4M 5hmdC or dC. Scale bar, 50 tum. d, Fraction of cells 
showing a YH2AX signal above background (n = 3). ANOVA with 

Sidak correction for multiple comparisons: P = 0.0208 (ShmdC versus dC 
MDA-MB-231), P = 0.0135 (ShmdC versus dC HOP-92). Error bars 

denote s.d. 
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Extended Data Figure 6 | Quantification of intracellular nucleotides byion- _ indicated experiments (blue) overlaid with standards separated on the same 
pair HPLC and SMUGI glycosylase activity. a, Illustrative chromatogram run (red). e, Typical image of denaturing PAGE electrophoresis of DNA 

of all standards indicated in b mixed together. b, Retention times of nucleotides incubated with SMUG1 and cleaved with APE1. f, Quantification of the DNA 
were determined by analysing each standard separately and are indicated in oligonucleotides with excised bases. g, Expression of SMUG] and uracil 

the table. c, An average relative abundance of NTP and dNTP levels in cells DNA glycosylase (UNG) in MDA_MB_231, SN12C and Capan-2 cell lines 
treated with dC, 5hmdC and 5fdC (n = 3). d, Representative chromatograms of (Genevestigator). Error bars denote s.d. 
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Extended Data Figure 7 | CDA expression in human cancer and normal 
tissues, and toxicity evaluation of 5hmdC and 5fdC in mice. a, CDA 
overexpression in pancreatic cancer (t-test, P< 0.0001). b, CDA expression 


across a panel of cancer (red) versus normal (green) tissues (GENT database). 


Arrows indicate cancer types with an evident difference between normal (N) 
and cancerous tissues (C). c, 5omdC and 5fdC detection in the blood 

(mass spectrometry) of intraperitoneally injected mice at 30 min after 
injection. d, Label-free mass spectrometry quantification of 5hmdC in the 
blood of animals injected with doses of 25, 50 and 100 mgkg | (n = 3 

(100 mg ml ') and n= 4 (25 and 50 mg ml ')). Error bars denote s.e.m. 
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e, Immunohistochemistry showing CDA expression in the intestine. 

f, Haematoxylin and eosin staining of the intestine of mice injected with PBS 
and 100mgkg ' of ShmdC and 5fdC. Tissue was removed 5 days after the 
injection. g, Immunofluorescence evaluation of proliferation (H3PS10) and 
DNA damage (yH2AX) in the intestine of mice treated with PBS and 
100mgkg ~ of ShmdC and 5fdC 5 days after treatment. In parallel, the 
protocol was done on testis of irradiated mice, where positive signals for 
H2AX were observed (data not shown). Scale bars, 50 um (e-g). h, Weight of 
the mice plotted over the treatment period (n = 16 per group). 
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Extended Data Figure 8 | Evaluation of wild-type SN12C cell line and CDA 
knockdown in a mouse xenograft model. a, Schematic illustration of 
xenograft establishment and treatment with nucleoside variants. b, Tumour 
diameter was measured by Vernier caliper and volume calculated by assuming 
that tumours were spheres (n = 8, two-way ANOVA with repeated 

measures and Holm-Sidak correction, P < 0.0001). c, Photos of the dissected 
tumours (asterisks indicate dissected lymph nodes found after histological 
analysis). d, Western blot showing CDA expression in tumours extracted from 


Protein extract from tumour 
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mice. e, Quantification of proliferation (H3PS10) and DNA damage (yH2AX) 
using confocal microscopy and ImageJ of the central section of the tumour. 
Scale bar, 50 jim (n = 4, one-way ANOVA, SN12C H3PS10: P = 0.0033 

(PBS versus 5hmdC), P = 0.0046 (PBS versus 5fdC); yH2AX: P = 0.0003 (PBS 
versus 5hmdC), P = 0.0436 (PBS versus 5fdC); SN12CshCDA_8: P = 0.0130 
(PBS versus 5hmdC)). f, 5amdU quantified from a HPLC-UV chromatogram 
of nucleosides from DNA extracted from tumours of mice treated with 
5hmdC and PBS (n = 4, one-way ANOVA P = 0.0041). Error bars denote s.d. 
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Extended Data Figure 9 | Identification and quantification of compounds’ _ hydrolysis buffer. b, 5-methylcytosine in the DNA does not change after 
resulting peaks in HPLC-UV. a, The abundance of molecule eluting at treatment with 5hmdcC. Identity of 5mdC in the samples was confirmed by 
5.1 min (5.7 min on the HPLC-QTOF) is not significantly different between HPLC-QTOF mass spectrometry. 

dC- and 5hmdC-treated samples. It is a common component of DNA 
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Extended Data Figure 10 | Identification and quantification of compounds’ resulting peaks in HPLC-UV. Compound eluting at 4.5 min (5.0 min on the 
HPLC-QTOEF) is an abundant component of DNA hydrolysis buffer, generating a m/z of 202.18. 
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Protein synthesis by ribosomes with tethered 


subunits 


Cédric Orelle+*, Erik D. Carlson?**, Teresa Szal', Tanja Florin', Michael C. Jewett”? & Alexander S. Mankin! 


The ribosome is a ribonucleoprotein machine responsible for protein 
synthesis. In all kingdoms of life it is composed of two subunits, each 
built on its own ribosomal RNA (rRNA) scaffold. The independent 
but coordinated functions of the subunits, including their ability to 
associate at initiation, rotate during elongation, and dissociate after 
protein release, are an established model of protein synthesis. 
Furthermore, the bipartite nature of the ribosome is presumed to 
be essential for biogenesis, since dedicated assembly factors keep 
immature ribosomal subunits apart and prevent them from trans- 
lation initiation’. Free exchange of the subunits limits the develop- 
ment of specialized orthogonal genetic systems that could be evolved 
for novel functions without interfering with native translation. Here 
we show that ribosomes with tethered and thus inseparable subunits 
(termed Ribo-T) are capable of successfully carrying out protein syn- 
thesis. By engineering a hybrid rRNA composed of both small and 
large subunit rRNA sequences, we produced a functional ribosome in 
which the subunits are covalently linked into a single entity by short 
RNA linkers. Notably, Ribo-T was not only functional in vitro, but 
was also able to support the growth of Escherichia coli cells even in the 
absence of wild-type ribosomes. We used Ribo-T to create the first 
fully orthogonal ribosome-messenger RNA system, and demonstrate 
its evolvability by selecting otherwise dominantly lethal rRNA muta- 
tions in the peptidyl transferase centre that facilitate the translation of 
a problematic protein sequence. Ribo-T can be used for exploring 
poorly understood functions of the ribosome, enabling orthogonal 
genetic systems, and engineering ribosomes with new functions. 

The random exchange of ribosomal subunits between recurrent acts 
of protein biosynthesis presents an obstacle for making fully ortho- 
gonal ribosomes, a task with important implications for fundamental 
science, bioengineering, and synthetic biology. Previously, it was pos- 
sible to redirect a subpopulation of the small ribosomal subunits 
from translating indigenous mRNAs to instead translating a specific 
mRNA by placing an alternative Shine-Dalgarno sequence in a 
reporter mRNA and introducing the complementary changes in the 
anti-Shine—-Dalgarno region in 16S rRNA””, which enabled selection 
of mutant 30S subunits with new decoding properties*. However, 
because large subunits freely exchange between native and orthogonal 
small subunits, creating a fully orthogonal ribosome has been imposs- 
ible, thereby limiting the engineering of the 50S subunit, including the 
peptidyl transferase centre (PTC) and the nascent peptide exit tunnel, 
for specialized new properties. 

The orthogonality of the full ribosome could be hypothetically 
achieved by linking the small and large subunit rRNA into a continu- 
ous molecule. A successful chimaeric 16S-23S construct must (1) 
properly interact with the ribosomal proteins and biogenesis factors 
for functional ribosome assembly; (2) avoid RNase degradation; and 
(3) have a linker(s) sufficiently short to ensure subunit cis-association, 
yet long enough for minimal interference with subunit movement 
required for translation initiation, elongation, and peptide release. 


In the native ribosome, the ends of 16S and 23S rRNA are too far apart 
(>170A) to be connected with a nuclease-resistant RNA linker. 
Therefore, we considered an alternative design in which the 23S 
rRNA would be ‘grafted’ into the 16S rRNA with the bridges connect- 
ing 16S and 23S rRNA sequences located across the rim of the subunits 
interface. To identify potential linking sites, we connected the native 
23S rRNA ends that are proximal to each other, and generated new 
termini at different locations (Fig. 1a). This circular permutation 
approach has been successfully exploited in vitro previously’, and a 
subsequent pilot study showed that three 23S rRNA circular permuta- 
tion variants could assemble into a functional subunit in vivo’. 
We prepared a comprehensive collection of 91 circularly permutated 
23S (CP23S) rRNA mutants with new ends placed at nearly every 
hairpin (Fig. 1b). The CP23S sequences were introduced in place 
of the wild-type 23S rRNA gene of the pAM552 plasmid (Fig. 1a, 
Extended Data Figs 1a and 2), and the resulting constructs were trans- 
formed in the Escherichia coli SQ171 cells lacking chromosomal rRNA 
alleles’. Twenty-two constructs were able to replace the resident plas- 
mid pCSacB carrying the wild-type rRNA operon (Fig. 1b, Extended 
Data Fig. 2d, e and Extended Data Table 1). Most of the viable circu- 
larly permutated variants had new 23S rRNA ends at the subunit 
solvent side, including several locations close to the interface rim 
(Fig. 1c). 

One of the viable mutants (CP2861, Fig. 1b) had 23S rRNA ends 
within the loop of helix 101 (H101), located in the ribosome near the 
apex loop of the 16S rRNA helix 44 (h44) (Figs 1c and 2c). Because the 
length of h44 varies among different species, and its terminal loop 
sequence can tolerate alterations®, h44 was a promising site for grafting 
the CP2861 23S rRNA and generating a hybrid 16S-23S rRNA mole- 
cule (Fig. 2a—c). In the chimaeric rRNA, the processing sequences 
flanking the mature 16S rRNA would remain intact for proper mat- 
uration of the 16S rRNA termini, whereas endonuclease processing 
signals of 23S rRNA would be eliminated, thereby preventing its cleav- 
age from the hybrid molecule. 

The RNA linkers must span the 30-40 A distance between h44 and 
H101 loops and allow for ~10 A subunit ratcheting during protein 
synthesis’? (Fig. 2c and Extended Data Fig. 3). Being unable to estim- 
ate the optimal length of the linkers accurately, we prepared a library of 
constructs, pRibo-T, in which the length of two tethers—T1 connect- 
ing 16S rRNA G1453 with 23S rRNA C2858, and T2 linking 23S 
C2857 with 16S G1454—varied from 7 to 12 adenine residues (Supple- 
mentary Table 2). Notably, plasmid exchange in SQ171 cells yielded 
several slowly growing colonies, and the pattern of extracted RNA 
showed a single major RNA species corresponding to the 16S-23S 
chimaera instead of the individual 16S and 23S bands (Fig. 2d). This 
result suggested that translation in these cells was carried out exclu- 
sively by Ribo-T, and revealed for the first time that the bipartite nature 
of the ribosome is dispensable for successful protein synthesis and 
cell viability. 
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Figure 1 | Global screening of circularly permutated 23S rRNAs identifies 
variants capable of replacing the natural 23S rRNA in a functional 
ribosome. a, The general scheme for constructing the rRNA operon in which 
the mature 23S rRNA gene sequence is replaced with the circularly permutated 
gene (CP23S). b, Secondary structure diagram of 23S rRNA” showing 
circular permutation (CP) constructs tested for their ability to support cell 
growth in the absence of wild-type ribosomes, named according to the number 
of the new 5’ position in the wild-type (WT) 23S rRNA structure (for example, 
CP 104). Viable circular permutation variants are green and italicized, non- 
viable variants are red. To assess the viability of CP mutants, two independent 
attempts to replace wild-type ribosomes with the CP construct were carried 
out. For all viable CP constructs, the lack of wild-type rRNA genes was 
confirmed by PCR as shown in the Extended Data Fig. 2, and the identity of the 
constructs in the resulting clones was verified by sequencing. c, The location 
of the new 5’ ends (spheres, viable in green, non-viable in red) of CP variants of 
the 23S rRNA in the crystallographic structure of the E. coli 70S ribosome" 
(Protein Data Bank (PDB) accession code 4V9D). The loops of helices h44 and 
H101 in the small and large subunit rRNA, respectively, used for subsequent 
experiments, are indicated by arrows. 


The linker combinations 8A/9A or 9A/8A (for T1/T2) were found 
in the six best-growing clones. The first combination showed slightly 
better behaviour in some subsequent experiments and was chosen 
for further investigation (pRibo-T plasmid, Extended Data Fig. 1b). 
The original SQ171/pRibo-T clones, although viable, grew slowly 
(doubling time 107 + 3 min compared to 35 + 1 min for SQ171 cells 
expressing wild-type ribosomes), exhibited poor recovery from the 
stationary phase, and low cell density at saturation (Extended Data 
Fig. 4a). By passaging cells in liquid culture for approximately 100 
generations, we isolated faster growing mutants. One such clone, 
SQ171fg/pRibo-T (for fast growing), exhibited better growth charac- 
teristics and shorter doubling time (70 +2 min) (Extended Data 
Fig. 4a). PCR and primer extension analysis showed the lack of 
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wild-type rDNA and rRNA, respectively, confirming that every ribo- 
some in this strain was assembled with the tethered rRNA (Extended 
Data Fig. 4b, c). Because the pRibo-T plasmid from the $Q171fg clone 
was unaltered, we sequenced the entire genome and found a nonsense 
mutation in the ybeX gene encoding a putative Mg”*/Co** trans- 
porter, and a missense mutation in the rpsA gene encoding ribosomal 
protein S1 (Extended Data Fig. 4d, e). Either one of these mutations or 
their combined effect must account for the faster growth of SQ171fg/ 
pRibo-T cells (henceforth called Ribo-T cells). 

To establish that protein synthesis in Ribo-T cells was carried out by 
ribosomes with tethered subunits, we carefully examined the integrity 
of Ribo-T rRNA. Analysis of Ribo-T preparations in a denaturing gel 
showed only very faint 16S and 23S-like rRNA bands (marked by 
asterisks in Extended Data Fig. 5a), possibly reflecting the linker cleav- 
age either in the cell or during Ribo-T isolation. In most of the multiple 
Ribo-T preparations, these cleavage products accounted for less than 
4% of the total Ribo-T rRNA. In some of the preparations, these bands 
were completely absent (for example, lane ‘Ribo-T(1)’ in Extended 
Data Fig. 5a), showing that more than 99% of Ribo-T remained intact. 
Consistently, primer extension across the T1 and T2 linkers did not 
show any major stops attesting to the general stability of the oligo(A) 
connectors (Extended Data Fig. 5d). Protein synthesis rate in Ribo-T 
cells reached 50.5 + 3.5% of that in cells with wild-type ribosomes 
(Extended Data Fig. 6a) and thus cannot be accounted for by a small 
fraction of Ribo-T with cleaved tethers. Unequivocal proof of active 
Ribo-T translation in vivo came from analysis of polysomes prepared 
from Ribo-T cells, in which intact 16S-23S hybrid rRNA (rather than 
the products of its cleavage) was associated with the heavy polysomal 
fractions (Fig. 2e). This result provided clear evidence that intact Ribo- 
T composed of covalently linked subunits is responsible for protein 
synthesis in the Ribo-T cells. 2D-gel analysis showed that most of the 
proteins present in SQ171 cells that express wild-type ribosomes are 
efficiently synthesized in the Ribo-T cells (Extended Data Fig. 6). 

We isolated ribosomes with tethered subunits from Ribo-T cells and 
characterized their composition and properties. The tethered ribo- 
some contains an apparently equimolar amount of 5S rRNA and the 
full complement of ribosomal proteins in quantities closely matching 
the composition of wild-type ribosome (Extended Data Fig. 5 b, c). 
Chemical probing showed that the rRNA hairpins h44 and H101 
remain largely unperturbed, while both linkers were highly accessible 
to chemical modification, indicating that they are solvent-exposed 
(Extended Data Fig. 7). 

Sucrose gradient analysis of Ribo-T showed that at 15 mM Mg** 
most of the ribosomal material sedimented as a 70S peak with a minor 
faster-sedimenting peak, which may represent Ribo-T dimers owing to 
cross-ribosome subunit association at a high Mg** concentration 
(Fig. 3a). At lower Mg’* concentration (1.5mM), when the native 
ribosome completely dissociates into subunits, Ribo-T still sediments 
as a single peak with an apparent sedimentation velocity of 65S 
(Fig. 3a). The distinctive resistance of Ribo-T to subunit dissociation 
offers a venue for isolating Ribo-T if it is expressed in cells concomi- 
tantly with wild-type ribosomes. 

We then tested the activity of Ribo-T in the PURExpress in vitro 
translation system lacking native ribosomes’. Ribo-T efficiently syn- 
thesized the 18-kilodalton (kDa) dihydrofolate reductase or super folder 
green fluorescence protein (sfGFP)'* (Fig. 3b). The rate of Ribo-T- 
catalysed protein synthesis reaches approximately 45% of that of the 
wild-type ribosomes (Fig. 3b). To assess which translation step is the 
most problematic for Ribo-T, progression of Ribo-T through a short 
synthetic gene’* was analysed by toe-printing (Fig. 3c). A more pro- 
nounced band of the ribosomes at the open reading frame start codon 
indicated that Ribo-T is impaired in translation initiation at a step 
subsequent to the start codon recognition. Although the true nature of 
this effect will require further investigation, it is unlikely to reflect a lower 
affinity of Ribo-T for initiation factors because higher concentrations of 
IF1, IF2 and IF3 could not rescue the initiation defect (data not shown). 
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Figure 2 | Ribo-T design. a, Wild-type (left) and 
Ribo-T (right) rRNA genes. In Ribo-T, the 
circularly permutated 23S rRNA gene, ‘closed’ at 
its native ends with a four-nucleotide long 
jie an connector (C) and ‘opened’ in the loop of H101, is 
we inserted via short tethers T1 and T2 into the apex 
2 = loop of h44 in the 16S rRNA gene. The resulting 
5S hybrid rRNA gene is transcribed as a single 
chimaeric 16S—23S rRNA, with its 5’ and 3’ ends 
probably processed by the enzymes of 16S rRNA 
maturation. b, Secondary structure of the mature 
wild-type (left) and Ribo-T (right) rRNAs. The 
red dots indicate the apex loops of h44 and H101, 
which in Ribo-T are connected by tethers T1 
and T2. The arrows at the 16S rRNA ends and the 
tethers in the Ribo-T map indicate the direction 
of transcription of the chimaeric 16S-23S rRNA. 
c, Left, the locations of the Tl and T2 tethers in 
T2 the three-dimensional model of Ribo-T (based on 
11 the structure of E. coli ribosome in the unrotated 
state’'; PDB code 4V9D). 16S rRNA is in yellow, 
30S proteins are in orange, 23S and 5S rRNA are 
in blue, 50S proteins are in cyan, P-site-bound 
0 tRNA is in olive, mRNA is in orange, connector 
(C) linking 23S native 5’ and 3’ ends is in green, 
and tethers T1 and T2 are in red. Right, the 
ribosome has been opened up like a book, 
exposing the subunit interface, with helices h44 
(16S) and H101 (23S) highlighted in orange and 
blue, respectively, and ribosomal proteins 
removed for clarity. d, Agarose gel 
electrophoresis of total RNA prepared from 
SQ171 cells expressing wild-type ribosomes or 
Ribo-T. The gel is representative of five 
independent biological replicates. e, Left, sucrose 
gradient fractionation of polysomes prepared 
from cells expressing wild-type ribosomes (top) or 
Ribo-T (bottom). Peaks corresponding to 
monosomes (70S), disomes (P2), trisomes (P3) and 
tetrasomes (P4) are indicated by arrows. Right, the 
agarose electrophoresis analysis of RNA extracted 
from the corresponding sucrose gradient peaks, 
wild-type ribosomes (WT) or Ribo-T (T). 
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To enable a fully orthogonal ribosome-mRNA system, we next 
engineered a Ribo-T version (oRibo-T) committed to translation of 
a particular orthogonal cellular mRNA. The wild-type 16S anti-Shine- 
Dalgarno region was altered from ACCUCCUUA to AUUGUGGUA 
(ref. 3) producing a poRibo-T1 construct. When poRibo-T1 was intro- 
duced in E. coli carrying the sf-gfp gene with the Shine-Dalgarno 
sequence CACCAC cognate to oRibo-T (Extended Data Fig. Ic, 
pLpp50GFP), notable sfGFP expression was observed (Extended 
Data Fig. 8a), demonstrating the activity of oRibo-T. 


Ribosomes prepared from poRibo-T 1-transformed cells (containing a 
mixture of wild-type ribosomes and oRibo-T) translated an orthogonal 
sf-gfp gene in a cell-free system (green dotted line in Extended Data 
Fig. 8b). However, because the orthogonal sf-gfp transcript is the only 
mRNA available during in vitro translation and no native mRNA engage 
wild-type 30S subunits, a fraction of orthogonal sfGFP biosynthesis is 
accounted for by wild-type ribosomes (pink dotted line in Extended Data 
Fig. 8b). Therefore, to isolate oRibo-T1 activity in vitro, we used the 
A2058G mutation in the 23S rRNA portion of oRibo-T, which rendered 
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Figure 3 | Functional characterization of Ribo-T. a, Sucrose gradient 
analysis of wild-type ribosomes (top) and Ribo-T (bottom) under 15 mM 
MgCl (solid line) or 1.5mM MgCl, subunit dissociating conditions (dotted 
line). The peak marked with grey arrow and ‘X’ may represent Ribo-T dimers. 
The result was qualitatively verified in an independent experiment performed 
at Mg? * concentrations 1.5 mM and 10 mM. b, In vitro translation of proteins 
by isolated Ribo-T. Top, SDS-PAGE analysis of the dihydrofolate reductase 
(DHFR) protein synthesized in the Aribosome PURExpress system 
supplemented with purified wild-type ribosomes or Ribo-T (T); wild-type 
ribosomes provided with the kit (WT*) were used as a control. The 
transcription-translation reaction was carried out in the presence of 
[?°S]-methionine in the absence or presence of 50 1M erythromycin (ERY). 
The A2058G mutation in Ribo-T renders the Ribo-T-driven translation 
resistant to the antibiotic. The ‘no erythromycin’ samples are a representative 
result of two independent biological experiments. Bottom, time course of sfGFP 
protein expression in the Aribosome PURExpress system supplemented 

with purified wild-type (black) or Ribo-T (grey) ribosomes. The kop, rates 


ribosomes resistant to macrolide and lincosamide antibiotics (for 
example, clindamycin). The addition of clindamycin to the reaction with 
wild-type ribosomes completely inhibited expression of the reporter 
(pink solid line in Extended Data Fig. 8b), whereas marked sfGFP 
expression was observed in the reaction carrying the oRibo-T prepara- 
tion (green solid line in Extended Data Fig. 8b). Importantly, the unique 
conjoined nature of Ribo-T allows for using antibiotic-resistance muta- 
tions in any of the ribosomal subunits. We demonstrated this by intro- 
ducing a G693A mutation in the small subunit moiety of oRibo-T, 
rendering oRibo-T resistant to pactamycin'’®’”. Pactamycin (100 11M) 
completely inhibited the activity of the wild-type ribosomes in the 
PURExpress translation system, whereas oRibo-T(G693A) remained 
fully active (Extended Data Fig. 8c). The combination of an orthogonal 
translation initiation signal with the antibiotic-resistance mutations 
embedded in oRibo-T allows for exploring unique properties of oRibo- 
T ina cell-free system even in preparations carrying a substantial fraction 
of wild-type ribosomes. 

During subsequent experiments, we fortuitously isolated a mutant 
version of the poRibo-T1 plasmid (poRibo-T2) that contained a single 
mutation in the Py promoter that improved its transformation prop- 
erties and was used thereafter (Extended Data Fig. 9). 

We next demonstrated the evolvability of oRibo-T by selecting the 
gain-of-function mutations in the PTC, which could facilitate trans- 
lation of a problematic protein sequence by the ribosome. The SecM 
polypeptide presents a classic example of an amino acid sequence for 
which translation is problematic for the ribosome’*. The expression of 
the essential SecA secretion ATPase is controlled by programmed 
ribosome stalling at the Prol66 codon of secM. Translation arrest 
ensues because specific interactions of the SecM nascent chain with 
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(385 + 13 relative fluorescent units (RFU) min’ (mean = s.d.) for wild-type, 
177 + 6 RFUmin ' for RiboT) were determined from the initial slopes. The 
activity of both ribosomes was fully inhibited by 50g ml’ chloramphenicol 
(time points indicated by x). Each curve is an average of two independent 
biological replicates, with error bars indicating the s.d. c, Toeprinting analysis of 
translation of a 20-codon synthetic gene RST (ref. 15) by wild-type ribosomes 
or Ribo-T. The antibiotic thiostrepton (THS), present at 50 1M, arrests the 
initiating ribosome at the start codon” (black arrowhead). The threonyl-tRNA 
synthetase inhibitor borrelidin (BOR) arrests translation at the fourth codon 
of RST1 mRNA (grey arrowhead)". The position of a toeprint band that would 
correspond to the ribosome that has reached the RST1 stop codon is shown 
by an open arrowhead. A more pronounced toeprint band at the start codon in 
the samples lacking thiostrepton indicates that Ribo-T departs from the 
initiation codon slower than wild-type ribosomes. A weaker borrelidin-specific 
band observed in the Ribo-T sample suggests that under our experimental 
conditions, fewer Ribo-T compared to wild-type ribosomes were able to reach 
the fifth codon, apparently owing to slower initiation. 


the ribosomal exit tunnel impair the PTC function, preventing the 
transfer of the 165-amino-acid long peptide to the incoming prolyl- 
transfer-RNA (Pro-tRNA). Several mutations in the ribosomal exit 
tunnel (for example, A2058G) have been previously identified as 
relieving translation arrest possibly by disrupting the interactions 
between the nascent chain and ribosome, and rRNA residues in the 
PTC A-site have been proposed to have a key role in the mechanism of 
ribosome stalling’**°. However, exploring the role of the PTC in the 
mechanism of the translation arrest has been impossible so far because 
of the lethal nature of PTC mutations”’””. 

We therefore asked whether the PTC A-site mutations can relieve 
SecM-induced translation arrest. Our interest in testing the use of 
oRibo-T for manipulating the ribosomal A-site was additionally 
fuelled by future prospects of engineering ribosomes capable of pro- 
grammed polymerization of unnatural amino acids and backbone- 
modified analogues. To search for SecM arrest bypass mutations, we 
removed the A2058G mutation from poRibo-T2 and prepared a lib- 
rary of plasmids with mutations at two 23S residues, A2451 and 
C2452. These residues form the amino acid binding pocket in the 
PTC A-site°”’ (Fig. 4b), and their mutations are dominantly lethal 
in E. coli’'”?. We also engineered an orthogonal SecM-based reporter, 
poSML (Fig. 4a and Extended Data Fig. 1d), encoding the SecM arrest 
sequence fused in frame with lacZa gene’* (Fig. 4a). 

Notably, when the C41(DE3) cells capable of «-complementation 
were transformed first with the poSML reporter and then with the 
poRibo-T2(A2451N/C2452N) mutant library, some of the colonies 
gained blue colour on indicator plates (Fig. 4c), demonstrating read- 
through of the SecM arrest sequence in some of the mutants. 
Sequencing 15 bluer colonies showed that they all carried a C2451- 
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Figure 4 | Evolving Ribo-T to identify gain-of-function PTC mutations 
that facilitate synthesis of problematic amino acid sequences. a, The 
SecM-LacZa reporter with an orthogonal Shine-Dalgarno (o0-SD) sequence is 
translated in the cell by oRibo-T. SecM-dependent ribosome stalling prevents 
expression of the lacZa gene unless a ribosomal mutation allows for bypass 
of the SecM arrest site. b, The placement of Phe-tRNAs bound in the P-site 
(orange) and A-site (yellow) of the PTC'®. The conserved 23S rRNA 
residues A2451 and C2452 (blue) form the amino acid side-chain binding 
pocket in the A-site. c, Top, colonies formed on X-gal/isopropyl B-D-1- 
thiogalactopyranoside (IPTG) plates by the E. coli C41 cells transformed with 
the secM-lacZa reporter plasmid and a library of poRibo-T2 plasmids with the 
PTC mutations at positions 2451 and 2452. Bottom, identity of 2451 and 
2452 residues in poRibo-T2 plasmids isolated from randomly picked 16 white 
colonies and 15 blue colonies. d, The E. coli C41 cells transformed with 

the secM-lacZa reporter and individual poRibo-T2 plasmids with different 
nucleotide combinations at positions 2451 and 2452. The transformed cells 
were initially plated on LB agar antibiotic plate without X-gal or IPTG (all 
colonies pale), and three randomly picked transformants were then streaked 
on the shown indicator plate containing X-gal and IPTG. The poRibo-T2 
mutant with the A2058G mutation, which is known to enhance the bypass 


C2452 sequence (the A2451C mutation) in the PTC. By contrast, 
none of the 16 analysed ‘white’ colonies had this sequence, and 
instead exhibited a variety of dinucleotide combinations at positions 
2451-2452 (Fig. 4c). We corroborated these results by individually 
testing all possible 2451-2452 mutants in poSML-transformed 
C41(DE3) cells. Importantly, all the mutants were viable, confirming 
that oRibo-T is suitable for expression of dominantly lethal 23S 
rRNA mutations in vivo, indicating a low degree of cross-association 
of oRibo-T with free wild-type 30S subunits. Consistent with our 
previous result (Fig. 4c), the A2451C mutation confers the most 
pronounced blue colour of the transformants, comparable to that 
seen in cells expressing oRibo-T with the tunnel mutation A2058G 
(Fig. 4d). The A2451U mutation also increased the blue hue of the 
cells although to a lesser extent. These results suggested that the 
A2451C (and A2451U) mutants were not only functional in cellular 
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of the SecM arrest sequence"’, was used as a positive control. A mutation of 
another essential PTC nucleotide (U2585G), which has been proposed to 

be implicated in some translation arrest scenarios”’, showed no effect on SecM 
arrest. The photographs of the agar plates in c and d have been contrast- 
enhanced for better colour separation. e, The A2451C mutation enhances 
bypass of the SecM stalling sequence by oRibo-T in vitro. The orthogonal 
construct containing secM stalling sequence fused in frame to the truncated 
lacZa gene was translated in the Aribosome PURExpress cell-free transla- 
tion system supplemented with wild-type non-tethered ribosomes or 
preparations of oRibo-T (A2451 or C2451). The Ribo-T constructs carried 
the pactamycin-resistance mutation G693A in 16S rRNA, and the reactions 
were carried out in the presence of pactamycin, which, in addition to the 
presence of an orthogonal Shine-Dalgarno sequence, ensured that the reporter 
is translated exclusively by oRibo-T (see the control wild-type lane with 

no translation products). Numbers on the left indicate the size (kDa) of 
molecular mass markers. The bar graph at the bottom shows the efficiency of 
bypass (ratio between the full-size and SecM-arrested translation products). 
A representative gel of two independent experiments is shown, with error 
bars indicating the s.d. 


protein synthesis but also gained the ability to bypass translation 
arrest caused by the SecM sequence. 

We verified in vitro the discovered role of A2451 in the mechanism 
of SecM translation arrest by testing the translation of the orthogonal 
secM-lacZa gene by isolated oRibo-T with and without the A2451C 
mutation. To assure oRibo-T activity only, the pactamycin-resistance 
mutation G693A (refs 16, 17) was introduced into the 16S segment 
of oRibo-T constructs, and cell-free translation in the PURExpress 
system was carried out in the presence of pactamycin. Only a small 
fraction of original oRibo-T was able to bypass the SecM arrest 
signal and synthesize the full-size hybrid protein (Fig. 4e, lane 
oRibo-T/A2451). By contrast, the A2451C mutant was able to bypass 
the SecM arrest site twice as efficiently as the unmodified oRibo-T 
(Fig. 4e, lane oRibo-T/C2451), confirming that the selected (and other- 
wise lethal) mutation in the PTC has improved the ability of oRibo-T 
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to polymerize a polypeptide sequence problematic for wild-type 
ribosomes. These results provide the first, to our knowledge, direct 
experimental evidence of a direct involvement of the PTC A-site in 
the mechanism of nascent peptide-dependent ribosome stalling, and 
suggest that interactions between the proline moiety of Pro-tRNA 
and the A-site rRNA residues are crucial for the SecM-induced trans- 
lation arrest. 

By engineering a ribosome with inseparable tethered subunits, and 
demonstrating its functionality in vivo and in vitro, we have revised 
one of the key concepts of molecular biology: that successful express- 
ion of the genome requires reversible association and dissociation of 
the ribosome into individual subunits. Although the ability of trans- 
lation initiation by 70S ribosome at leaderless mRNAs or via scanning 
re-initiation has been previously demonstrated”*”, it was surprising 
that Ribo-T would be active enough to express the entire bacterial 
genome at a sufficient level for active cell growth and proliferation. 
This finding in turn made possible a fully orthogonal and evolvable 
gene expression system in the cell in which an entire specialized ribo- 
some, not just the mRNA-interacting small subunit, is dedicated to the 
translation of a defined genetic template. As a proof of principle we 
showed that oRibo-T can be used for studying in cells mutations of 
functionally crucial rRNA residues that are dominantly lethal, a task 
that would be difficult or impossible to achieve in any other system. 
This shows that Ribo-T may find important implications in exploring 
poorly understood functions of the ribosome in protein synthesis. 
Furthermore, the opportunity provided by the oRibo-T system to 
modify the catalytic properties of the protein synthesis machine opens 
up exciting prospects for engineered ribosomes with principally new 
properties. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Preparation of circularly permuted variants of the 23S rRNA. The A2058G 
mutation was introduced into the pAM552 plasmid (Extended Data Fig. 1a) by 
inverse PCR using primers 5'-CCGTCTTGCCGCGGGTAC-3’ and 5'-GTGTAC 
CCGCGGCAAGACGGGAAGACCCCGTGAACC-3’ (the underlined sequence 
is complementary to the second primer and the mutation is shown by italicized 
bold character) followed by re-circularization by Gibson assembly reaction’* (all 
primers used in this study were synthesized by Integrated DNA Technology). 
A 23S-A2058G gene with native 5’ and 3’ ends linked by a GAGA tetra-loop was 
generated by inverse PCR using primers 5’-GGTTAAGCCTCACGGTTC-3’ and 
5'-CCGTGAGGCTTAACCGAGAGGTTAAGCGACTAAGCGTAC-3’ (GAGA 
tetra loop in bold) and pAM552-A2058G as template. Purified PCR product 
(50 ng) was circularized by Gibson assembly reaction for 1 h at 50 °C. The resulting 
circular 23S rRNA gene was then cloned at its native unique Eagl restriction site 
(position 1905 in wild-type 23S rRNA gene) into T7-Flag-4 plasmid (Sigma 
Aldrich) as follows. The circularized 23S rRNA gene was amplified by inverse 
PCR using primers 5’-GAGACACAACGTGGCTTTCCGGCCGTAACTATAA 
CG-3' and 5'-CACTCGTCGAGATCGATCTTCGGCCGCCGTTTACC-3’ (added 
homology to the T7-Flag-4 vector underlined) and Gibson-assembled with the T7- 
Flag-4 vector amplified with the primers 5’-AAGATCGATCTCGACGAGTG-3’ 
and 5'-GAAAGCCACGTTGTGTCTC-3’. The cloned circularly permuted 23S 
rRNA gene in the resulting plasmid pCP23S-Eagl containing a pBR322 origin of 
replication and KanR selective marker (Extended Data Fig. 2) was fully sequenced. 

The pCP23S-EagI plasmid was then digested with EagI (New England Biolabs) 
for 1 h at 37 °C, and the circularly permutated 23S rRNA (CP23S) gene was isolated 
from a SYBRSafe-stained 0.7% agarose gel using a E.Z.N.A. Gel Extraction kit 
(Omega). The 23S rRNA was circularized by T4 DNA ligase (New England 
Biolabs) in a 50 pl reaction with 2.5 ng pl’ DNA for 14h at 16°C, followed by 
heat inactivation at 65°C for 10 min. The reaction was diluted 1:100 for use as a 
template in the PCR reactions for generating the circular permutants (Extended 
Data Fig. 2). 

Ninety-one CP23S mutants were designed by introducing new 23S rRNA 5’ and 
3’ ends at most of the apex loops and some internal loops of rRNA helices to assure 
spatial proximity of the new rRNA termini in the fully assembled 50S ribosomal 
subunit. Each CP23S rRNA gene was PCR-amplified in a 40 ul reaction using 
Phusion High Fidelity DNA polymerase (New England Biolabs), with primer pairs 
shown in Supplementary Table 1, and 4 ll of the 1:100 diluted 23S circular ligation 
reaction as template. Each primer pair adds to the 5’ and 3’ ends of the amplified 
CP23S gene 20-base-pair (bp) of homology to the 23S rRNA processing stem 
retained in the target vector pAM552-A23S-AflII (described below). PCR reac- 
tions catalysed by the Phusion High Fidelity DNA polymerase were run under the 
following conditions: 98 °C, 10 min followed by 25 cycles (98 °C, 30 s; 60 °C, 30s; 
72°C, 180s), followed by the final incubation for 15 min at 72 °C. The reaction 
product was purified using E.Z.N.A. Cycle Pure kit (Omega) and the size of the 
amplified DNA was confirmed by electrophoresis in a 1% agarose gel. For circular 
permutations with off target bands (12 in total), the PCR product of the correct size 
was extracted from the agarose gel. 

To minimize PCR errors in generating the vector backbone, which carried 16S 
and 5S rRNA sequences, and prevent carry-through of the wild-type r7mB operon, 
universal backbone vector pAM552-A23S-AfllI lacking the 23S rRNA gene and 
containing added AfilI restriction site for cloning of CP23S was prepared. The 
plasmid pAM552-AfllI was constructed from pAM552 by adding AfilI restriction 
sites within the terminal stem of the wild-type 23S rRNA gene by introducing the 
G2C and C2901G mutations. First, the G2C mutation was introduced by inverse 
PCR using 5’-phosphorylated primers CTTAAGCGACTAAGCGTACAC and 
CTCACAACCCGAAGATGTTTC, followed by blunt-end ligation, transforma- 
tion into E. coli POP2136 electrocompetent cells, plating on LB-agar plates sup- 
plemented with 50 pg ml * carbenicillin, growth overnight at 30 °C, single colony 
isolation and sequencing. The C2901G mutation was added by the same method 
using 5’-phosphorylated primers GCTTACAACGCCGAAGCTG and TTAA 
GCCTCACGGTTCATTAG. The introduced mutations preserved the integrity 
of the 23S rRNA terminal stem and did not affect growth of SQ171 cells expressing 
only ribosomes with the pAM552-AfllI-encoded rRNA (doubling times 53.9 + 1.0 
min for SQ171 cells transformed with pAM552 and 53.3 + 2.4 min for SQ171 
transformed with pAM552-AflII, as determined from four separate colonies each 
on Biotek Synergy H1 plate readers in 96-well flat bottom plates (Costar) in 100 pl 
LB supplemented with 50 pig ml! carbenicillin, 37 °C, linear shaking with 2mm 
amplitude, at 731 cycles per min). To remove the 23S rRNA gene, pAM552-AflII 
was digested with AfllI (New England Biolabs) for 1h at 37°C, the backbone 
portion of the vector was gel-purified and ligated with T4 DNA ligase (New 
England Biolabs) overnight at 16 °C. It was then transformed into POP2136 cells, 
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plated on LB/agar plates supplemented with 50 pg ml‘ carbenicillin, and grown 
at 30 °C. Plasmids from several colonies were isolated and fully sequenced. The 
resulting pAM552-A23S-AflII plasmid contains the 16S rRNA, 23S processing 
stems with an added AflIl restriction site, 5S rRNA, and B-lactamase resistance 
gene and ColE1 ori (Extended Data Fig. 2). Vector backbone was prepared by 
digesting pAM552-A23S-AfllI with AflII restriction enzyme at 37 °C for 2h and 
purification using an E.Z.N.A. Cycle Pure kit. 

All the CP23S constructs were assembled in parallel by Gibson assembly reac- 

tion (Extended Data Fig. 2) in a 96-well PCR plate. For each CP23S target, 50 ng of 
Aflll-digested purified backbone was added to threefold molar excess of the PCR- 
amplified and purified CP23S insert. Gibson assembly mix”* (15 pl) was added, the 
final volumes brought to 48 j1l with nuclease-free water, and incubated at 50 °C for 
1h in the PCR machine. No CP23S insert was added to the negative control 
reaction. To check the efficiency of DNA assembly, 2 ul of selected assembly 
reactions were transformed into electrocompetent POP2136 cells. After 1 h recov- 
ery at 37 °C in SOC media, a quarter of each transformation was plated on LB-agar 
plates supplemented with 501g ml’ carbenicillin and grown for 20h at 30°C. 
A typical CP23S assembly reaction generated 30-120 POP2136 colonies with the 
control reaction generating only few colonies. 
Testing CP23S rRNA constructs. Transformation of SQ171/pCSacB rubidium 
chloride-competent cells was carried out in a 96-well plate. Two microlitres of the 
Gibson Assembly reactions were added to 20 jl competent cells in the pre-chilled 
plate. After a 45-min incubation in ice/water bath, 45 s at 42 °C and 2 min on ice, 
130 pl of SOC medium was added to the wells and the plate was incubated 2h at 
37°C with shaking at 600r.p.m. on a microplate shaker. Forty microlitres of 
medium were then transferred from each well to the wells of another 96-well plate 
containing 120,11 SOC supplemented with 100,gml~' ampicillin and 0.25% 
sucrose. The plate was incubated overnight at 37°C with shaking at 600 r.p.m. 
A 96-pin replicator was used to spot aliquots of the cultures onto a rectangular LB 
agar plate containing 100 jig ml” ' ampicillin, 5% sucrose and 1 mg ml’ erythro- 
mycin. The plate was incubated overnight at 37°C and the appearance of 
Amp’/Ery’ transformants was recorded. The completeness of the replacement 
of the wild-type pCSacB plasmid with the plasmids carrying circularly permutated 
23S rRNA gene was verified by PCR using a mixture of three primers: primer 1 
(5'-GCAGATTAGCACGTCCTTCA-3’) complementary to the 23S rRNA seg- 
ment 50-69), primer 2 (5’-CGTTGAGCTAACCGGTACTA-3’) containing the 
sequence of the 23S rRNA segment 2863-2882, and primer 3 (5'-GGGTGAT 
GTTTGAGATATTTGCT-3’) corresponding to the sequence of the 16S/23S 
intergenic spacer 139-116 bp upstream from the 23S rRNA gene in rrnB 
(Extended Data Fig. 2e). The combination of the primers 1 and 3 produces a 
207-bp PCR band if the wild-type rrn operon is present; the combination of 
primers 1 and 2 produces a 112-bp PCR band on the templates with circularly 
permutated 23S rRNA gene (Extended Data Fig. 2e). 

To reduce the number of false-negative CP23S rRNA variants, the experiment 
was repeated one more time using de novo assembled Gibson reactions with the 
cp23S rRNA constructs that failed to replace pCSacB in the first experiment. Two 
additional functional CP23S rRNA constructs were recovered from the second 
attempt. Altogether, 22 CP23S rRNA variants were able to replace pCSacB in the 
SQ171 cells. CP23S identity was confirmed by plasmid sequencing. Growth rates 
were analysed on Biotek Synergy H1 plate readers in 96-well flat bottom 
plates (Costar) in 100 pl LB with 50 pg ml’ carbenicillin. Doubling times and 
final Agoo nm after 18h are shown in Extended Data Table 1. 

Construction of pRibo-T. To avoid generation of mutations in the 238 rRNA 
gene during PCR amplification for Gibson assembly, the 23S rRNA gene variant 
circularly permuted at H101 (corresponding to CP2861 from Fig. 1) was first 
cloned in the pUC18 vector. For that, the 23S rRNA gene circularly permuted 
at H101 was PCR-amplified from circularized 23S rRNA gene prepared in the 
circular permutation study (see above and Extended Data Fig. 2a) by using the 
high-fidelity AccuPrime Taq polymerase (Life Technologies) and primers con- 
taining BamHI restriction sites (shown in bold) 5’-TATTGGATCCGATGC 
GTTGAGCTAACCGGTA-3’ and 5’-TTATGGATCCTGCGCTTACACACCC 
GGCCTAT-3’. The amplified fragment was cut with BamHI and cloned in depho- 
sphorylated BamHI-cut pUC18 plasmid. A plasmid containing CP2861 23S rRNA 
(pUC23S) was fully sequenced to verify the lack of mutations in the 23S rRNA gene. 

For preparation of pRibo-T (Extended Data Fig. 1b), pAM552-A23S-AflII 
plasmid (see above) served as a recipient for the CP2861 23S rRNA gene. The 
CP2861 23S RNA gene was excised from the pUC23S plasmid by BamHI digestion 
and gel purified. To graft the CP2861 23S rRNA gene into the 16S rRNA gene, 
the plasmid backbone was prepared by PCR-amplifying the plasmid pAM552- 
A23S-AflII (Sng in 50ul reaction) using primers introducing poly-A linkers 
and sequences corresponding to H101 of 23S rRNA (underlined) and h44 in 
16S rRNA (italicized) TTAGTACCGGTTAGCTCAACGCATCG(T)7_13CGAA 
GGTTAAGCTACCTACTTICTITTGC (reverse primer with tether T1) and TTG 
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ATAGGCCGGGTGTGTAAGCGCAG(A)7_12GGAGGGCGCTTACCACTITGT 
(forward primer with tether T2). The PCR reaction, which was catalysed by 
Phusion High Fidelity DNA polymerase, was carried out under the following 
conditions: 98°C for 2 min followed by 30 cycles of (98°C, 30s; 62°C, 30s; 
72°C, 2 min) followed by 72°C for 5 min. The resulting 4.6-kilobase (kb) PCR 
fragment was treated with DpnI for 4h at 37 °C and purified using Wizard SV Gel 
and PCR Clean-Up kit (Promega). The PCR-amplified plasmid backbone and the 
gel-purified CP2861 23S rRNA gene fragment were combined in a Gibson 
Assembly reaction. Five microlitres of the reaction mixture was transformed into 
50 pl electrocompetent POP2136 E. coli cells. Cells were plated onto LB/agar plate 
supplemented with 1004gml~' ampicillin. After 24h incubation at 30°C, the 
colonies appeared. Seventeen colonies were picked, grown in LB/ampicillin 
at 30°C, plasmids were isolated and linkers were sequenced using the primers 
5'-GAACCTTACCTGGTCTTGACATC-3’ (corresponding to the 16S rRNA 
sequence 976-998) and 5'-ATATCGACGGCGGTGTTTG-3’ (corresponding 
to the 23S rRNA sequence 2476-2495) to verify the complexity of the linker library 
(Supplementary Table 2). All the colonies were then washed off the plate and total 
plasmid was extracted and used to transform SQ171-competent cells. 
Functional replacement of the wild-type ribosome by Ribo-T. SQ171 cells 
carrying the pCSacB plasmid, which contains the wild-type rrnB operon, were 
transformed with the total pRibo-T preparation isolated from the POP2136 cells. 
In brief, 250 ng of plasmid preparation were added to 250 ul of rubidium-chloride- 
competent cells. Cells were incubated for 45 min on ice, 45s at 42°C and then 
2 min on ice followed by addition of 1 ml SOC medium and incubation at 37 °C for 
2h with shaking. A 150-l aliquot of the culture was transferred to 1.85 ml SOC 
supplemented with 100 1g ml! ampicillin and 0.25% sucrose (final concentra- 
tions) and grown overnight at 37°C with shaking. Cells were spun down and 
plated on an LB agar plate containing 100p1gml~ ampicillin, 5% sucrose 
and Imgml' erythromycin. Eighty of the colonies that appeared after 48-h 
incubation of the plate at 37 °C were inoculated in 2 ml LB supplemented with 
100 ppg ml~! ampicillin and grown for 48h. The growth rate of ~30 clones that 
managed to grow during that period was then assessed in LB/ampicillin medium 
in the 96-well plate. Plasmids were isolated from six faster growing clones and 
linkers were sequenced. The linker T1 in five sequenced clones was composed of 9 
adenines and linker T2 was composed of 8 adenines, while one clone had the 
reverse combination. Total RNA was extracted from these clones using RNeasy 
Mini Kit (Qiagen) and analysed by agarose electrophoresis. The successful replace- 
ment of the wild type pCSacB plasmid with the pRibo-T plasmids carrying Ribo-T 
was verified by PCR using primers 5’-GACAGTTCGGTCCCTATCTG-3’ (cor- 
responding to the 23S rRNA sequence 2599-2618) and 5'-TTAAGCCTCACG 
GTTCATTAG-3’ (complementary to the 23S rRNA sequence 2880-2900) and 
additionally verified by primer extension on the total cellular rRNA as indicated in 
the Extended Data Fig. 4. The growth of the cells was monitored at 37 °C in 150 pl 
of LB supplemented with 100 1g ml” of ampicillin in the wells of a 96-well plate in 
the TECAN microplate reader (15 min orbital shaking with a 3-mm amplitude 
followed by 5 min rest before reading). The doubling time (t) values estimated 
from the logarithmic parts of the growth curves are indicated in Extended 
Data Fig. 4a. 

Polysome analysis. The cultures of cells (250 ml) of the SQ171fg strain trans- 
formed with either pAM552 (wild-type) or pRibo-T8/9 were grown at 37 °C with 
vigorous shaking. When the optical density reached Agoo nm 0.4-0.7, chlor- 
amphenicol solution was added to obtain final concentration of 125 pg ml7' 
and, after 5 min, cells were pelleted by centrifugation at 4°C. Polysomes were 
prepared following the published protocol”’ by freeze-thawing in the lysis buffer 
(20mM Tris-HCl, pH 7.5, 15mM MgCl.) supplemented with 1 mgm‘ lyso- 
zyme 0.25% sodium deoxycholate and 2 U of RQ1 DNase (Promega). The lysates 
were centrifuged at 20,000g for 30 min at 4 °C and polysomes-containing super- 
natants (20 A260 nm absorbance units) were loaded onto the 12-ml 10-50% sucrose 
gradient (buffer: 20 mM Tris-HCl, pH 7.5, 10 mM MgCl,, 100 mM NH,Cl, 2 mM 
B-mercaptoethanol). Polysomes were resolved by centrifugation in a SW-41 rotor 
(39,000 r.p.m., 3 h, 4 °C). Gradients were fractionated using BioComp Instrument 
gradient fractionator and fractions were collected in the wells of a 96-well plate. 
Appropriate fractions were pooled, ribosomes were ethanol-precipitated and 
resuspended in 2001 of buffer containing 300mM sodium acetate, pH 5.5, 
5mM EDTA, 0.5% SDS. rRNA was isolated by successive extractions with phenol 
(pH 6.6), phenol/chloroform and chloroform. After ethanol precipitation, RNA 
was analysed by non-denaturing agarose gel electrophoresis. 

Analysis of protein synthesis rate and proteins synthesized in Ribo-T cells. The 
protein synthesis rate in SQ171fg cells expressing either wild-type ribosomes 
(plasmid pAM552) or Ribo-T (pRibo-T plasmid) was measured by following 
incorporation of [*°S]L-methionine into proteins as described’°. Specifically, 
0.25 Ci of [*°S]i-methionine (specific activity 1,175 Ci mmol ') (American 
Radiolabeled Chemicals) was added to 1 ml of exponentially growing cells at 


37 °C, and after a 45s incubation, proteins were precipitated by addition of 1 ml 
of ice-cold 25% trichloroacetic acid (TCA) containing 2% casamino acids. After 
incubating for 30 min on ice and then 30 min at 100°C, samples were passed 
through G4 glass fibre filters. The filters were washed three times with 3 ml of 
ice-cold 5% TCA, and once with 3 ml of acetone and air dried, and the amount of 
retained radioactivity was determined by scintillation counting. Preliminary mea- 
surements of the time course of [*°S]L-methionine incorporation in the faster- 
growing SQ171fg/pAM552 cells showed that radioactivity curve plateaus after 
120s of incubation of cells with [°°S]L-methionine. 

Exponential cultures (250 ml) of the SQ171fg strain transformed with either 

pAM552 (A2058G) or pRibo-T8/9 growing in LB medium supplemented with 
100 pg ml~? ampicillin and 50 jig ml’ spectinomycin were collected by centrifu- 
gation and cells were flash-frozen in liquid nitrogen. Protein isolation and two- 
dimensional gel electrophoresis was performed by Kendrick Labs. 
Preparation of Ribo-T and wild-type ribosomes and analysis of their RNA and 
protein content. Ribosomes were prepared from the exponentially growing cells 
of the SQ171fg strain transformed with either pAM552 (wild-type) or pRibo-T8/9 
as described**. RNA was phenol extracted, precipitated as previously described 
and resolved by electrophoresis in a denaturing 6% (acrylamide:bis-acrylamide 
ratio 1:19, w/w) polyacrylamide gel (for the 5S rRNA analysis) or 4% (acrylami- 
de:bis-acrylamide ratio 1:29, w/w) polyacrylamide gel (for the analysis of 
large rRNAs). 

Ribo-T-associated ribosomal proteins were analysed by mass spectrometry at 
the Proteomics Center of Excellence, Northwestern University. Ribosomes were 
precipitated by incubation in 20% trichloracetic acid at 4 °C overnight and cent- 
rifugation at 14,000g for 10 min. Precipitated ribosomes were washed once with 
cold 10% trichloracetic acid and twice with acetone. The pellet was air-dried for 
10-20 min before resuspension in 20 ul 8 M urea. Proteins were reduced with 
10 mM dithiothreitol, and cysteine residues alkylated with 50 mM iodoacetamide 
in the final volume of 160 pil. Sequencing-grade trypsin (Promega) was added at a 
1:50 enzyme:protein ratio, and after overnight digestion at room temperature, the 
reaction was stopped by addition of formic acid to 1%. After digestion, peptides 
were desalted using C18 Spin columns (Pierce, 89870) and lyophilized. Amino- 
reactive tandem mass tag (TMT) reagents (126/127, Thermo Scientific, 90065) 
were used for peptide labelling. The reagents were dissolved in 41 l acetonitrile 
and added to the lyophilized peptides dissolved in 100 pl of 100 mM triethylam- 
monium bicarbonate. After 1 h at room temperature, the reaction was quenched 
by adding 8 ul of 5% hydroxylamine. After labelling, the two samples under 
analysis were mixed in 1:1 ratio. Peptides were desalted using C18 ZipTip 
Pipette Tips (EMD Millipore) and resuspended in 30 ul of solvent A (95% water, 
5% acetonitrile, 0.2% formic acid). 

Peptides were analysed using nanoelectrospray ionization on an Orbitrap Elite 
mass spectrometer (Thermo Scientific). Proteome Discoverer (Thermo Scientific) 
and the Sequest algorithm were used for data analysis. Data were searched against 
a custom database containing UniProt entries using E. coli taxonomy, allowing 
three missed cleavages, 10 p.p.m. precursor tolerance, and carbamidomethylation 
of cysteine as a static modification. Variable modifications included oxidation of 
methionine, TMT of lysine and amino-terminal TMT. For quantification via 
the reporter ions the intensity of the signal closest to the theoretical m/z, within 
a +10-p.p.m. window, was recorded. Reporter ion intensities were adjusted based 
on the overlap of isotopic envelopes of all reporter ions as recommended by the 
manufacturer. Only peptides with high confidence were used for quantification. 
Ratios of 126/127 were normalized based on median. 

Sucrose gradient analysis of ribosomes and ribosomal subunits. Wild-type 70S 
ribosomes or Ribo-T isolated from SQ171fg cells as described above were diluted 
approximately 70-fold in high Mg”* buffer (20mM Tris-HCl, pH 7.5, 100mM 
NH,Cl, 2mM 2-mercaptoethanol, 15 mM MgCl,) or low Mg*” buffer (20mM 
Tris-HCl, pH 7.5, 100mM NH,Cl, 2mM 2-mercaptoethanol, 1.5mM MgCl). 
After incubation for 30min at 4°C, ribosomes and subunits were resolved in 
10-40% 12-ml sucrose gradients prepared with the same buffers. Gradients were 
centrifuged in the SW41 rotor at 38,000 r.p.m. for 3h at 4°C. Ribosome profiles 
were then analysed using gradient fractionator (BioComp Instrument). 

Probing the structure of the Ribo-T tethers. The structure of the tethers was 
probed by dimethylsulfate (DMS) modification following a published protocol”. 
In brief, 10 pmol of Ribo-T or wild-type ribosomes were activated by incubation 
for 5 min at 42 °C in 50 ul of buffer 80 mM HEPES-KOH, pH 7.6, 15 mM MgCh, 
100 mM NH, Cl containing 20 U of RiboLock RI RNase inhibitor (Thermo Fisher 
Scientific). Two microlitres of DMS (SIGMA) diluted 1:10 in ethanol were added 
(2 ul of ethanol were added to the unmodified controls) and samples were incu- 
bated for 10min at 37°C. The modification reaction was stopped and rRNA 
extracted as described*’. Primer extensions were carried out using the primers 
5'-GACTGCCAGGGCATCCACCG-3’ and 5’-AAGGTTAAGCCTCACGG-3’ 
(for tether T1) or 5’-CCCTACGGTTACCTTGTTACG-3’ for tether T2. 
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Additionally, the integrity of the tethers in the Ribo-T preparation was tested 

by extension of the primers annealing immediately 3’ to the tether. Primer 
5'-GTACCGGTTAGCTCAACGCATC-3’ was extended by reverse transcriptase 
across tether T1 in the presence of dATP, dTTP, dGTP and ddCTP, and primer 
5'-CACAAAGTGGTAAGCGCCCTCCT-3’ was extended across tether T2 in the 
presence of dATP, dTTP, dCTP and ddGTP. 
Testing Ribo-T activity in cell-free translation system. The DNA template 
containing the T7 promoter and the sf-GFP gene'* was PCR amplified from a 
pY71-sfGFP plasmid” using primers 5’-TAATACGACTCACTATAGGG-3’ and 
5'-CTTCCTTTCGGGCTTTGTT-3’. GFP mRNA was prepared by in vitro tran- 
scription and purified by size-exclusion chromatography on a Sephadex G50 
mini-column, phenol extraction and ethanol precipitation. The transcript was 
translated in the A(ribosome, amino acid, tRNA) PURExpress system kit (New 
England Biolabs). A typical translation reaction was assembled in a total volume of 
10 pl and contained 2 ul of the kit solution A, 1.2 pl of factor mixture, 1 pl amino 
acid mixture (3 mM each), 1 pl tRNA (20 pg ml '), 0.4 ul Ribolock RNase inhib- 
itor (40 U pl‘), 5 pg (~20 pmol) GFP transcript and 22 pmol of wild-type ribo- 
somes or Ribo-T. Samples were placed in wells of a 384-well black wall/clear flat 
bottom tissue-culture plate (BD Biosciences) and covered with the lid. Reactions 
were incubated at 37 °C in a microplate reader (Tecan), and fluorescence values 
were recorded every 20 min at Aexc = 488 nm and Aen = 520 nm over 7h. Protein 
synthesis rates were calculated by linear regression over the time points 0, 40 and 
60 min with a R?>0.9 using the trendline function of Excel (Microsoft). Time 
point 20 min was not taken into consideration because the plate was switched from 
ice to 37 °C at time 0. 

Transcription/translation of the dihydrofolate reductase template supplied with 
the A(ribosome, amino acid, tRNA) PURExpress kit (New England Biolabs) was 
carried in the presence of [°°S]L-methionine (1,175 Cimmol *) using manufac- 
turers protocol. A typical 5 pl reaction, assembled as described above but using 
50ng of the DNA template, was supplemented with 5 ,1Ci [°°S].-methionine and 
10 pmol of wild-type or Ribo-T ribosomes. When needed, the reactions were 
supplemented with 501M erythromycin. Reactions were incubated for 2h at 
37 °C, and protein products were analysed by SDS-PAGE in 16.5% Bis-Tris gels 
(Biorad) using NuPAGE MES/SDS running buffer (Invitrogen). Gels were stained, 
dried and exposed to a phosphorimager screen overnight. Radioactive bands were 
visualized by Typhoon phosphorimager (GE Healthcare). 

Toeprinting analysis. Toeprinting was performed as previously described’***. 
When needed, the threonyl-tRNA synthetase inhibitor borrelidin or the initiation 
inhibitor thiostrepton were added to the reactions to the final concentrations of 
50 uM. 

Construction of the plasmids for testing oRibo-T activity in vivo. The backbone 
plasmid pT7wtK (Extended Data Fig. 1c) was first prepared from the commercial 
plasmid T7-Flag-4 (Sigma Aldrich) by introducing the following changes. First, 
the bla gene was deleted using inverse PCR with phosphorylated primers 
5'-TAACTGTCAGACCAAGTTTACTC-3’ and 5'-ACTCTTCCTTTTTCAATAT 
TATTGAAG-3’ and Phusion High Fidelity DNA polymerase. Following purification 
with E.Z.N.A. Cycle Pure kit, DNA was blunt-end ligated for 14h at 16 °C using T4 
DNA ligase, and transformed into electrocompetent DH5« E. coli cells and plated on 
LB-agar supplemented with 30 1g ml~' kanamycin. Next, a BglII-NotI cloning site 
was introduced using phosphorylated primers 5’-AGATCTGTTGCTACGCAGCG 
TTGCGGCCGCTGAAGATCGATCTCGACG-3' and 5’-GCCTCCTATGAAA 
AAATAACAGATATAGTCTCCCTATAGTGAGTCGTATTAGG-3’, with BglII 
and Not! sites in bold. A sequence 3’ of the T7 promoter, termed N15 (underlined), 
optimized for T7 expression of an orthogonal gene** was introduced on one of the 
primers. Purified PCR product was blunt-end ligated with T4 DNA ligase for 14h at 
16 °C, transformed into DH5a electrocompetent cells and plated on LB-agar supple- 
mented with 30 .gml’ kanamycin. The resulting plasmid pT7wtK contains a T7 
promoter, wild-type Shine-Dalgarno sequence, a BglII-NotI cloning site, T1/T2 
terminator, pMB1 origin of replication, a /acI gene and a kanamycin resistance gene. 

To create plasmid pT7wtGEP, primers 5'‘-GGTGGTAGATCTATGAGCAAA 
GGTGAAGAAC-3’ and 5'-GGTGGTGCGGCCGCGGGCTTTGTTAGCAG-3’ 
were used to PCR amplify the sf-gfp gene from pY71-sfGFP”’, adding BglII and 
Not! restriction sites (bold) at the ends of the sf-gfp PCR product. Purified PCR 
product and plasmid pT7wtK were digested with BglII and NotI (New England 
Biolabs) for 1h at 37°C. The pT7wtK digested vector was treated with alkaline 
phosphatase CIP (New England Biolabs) for 1h at 37°C. Both reactions were 
purified with E.Z.N.A. Cycle Pure kit. The sf-gfp insert was added in threefold 
molar excess to 50 ng pT 7wtK backbone, and ligated with T4 DNA ligase (NEB) 
for 14h at 16 °C, transformed into DH5a electrocompetent cells and plated on LB- 
agar supplemented with 30 pg ml’ kanamycin. 

To create pT70GFP (Extended Data Fig. 1c) containing sf-gfp, the translation 
of which is controlled by an orthogonal Shine-Dalgarno sequence, the wild- 
type Shine-Dalgarno sequence of pT7wtGFP (AGGAGG) was mutated to an 
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orthogonal sequence CACCAC (ref. 3) by inverse PCR using phosphorylated 
primers 5'-ATGAGCAAAGGTGAAGAAC-3’ and 5’-AGATCTGTGGTGTGA 
AAAAATAACAGATATAGTCTC-3’. PCR product purified with E.Z.N.A. Cycle 
Pure kit was blunt-end ligated with T4 DNA ligase for 14h at 16 °C, transformed 
into electrocompetent DH5« cells and plated on LB-agar supplemented with 
30 pg ml? kanamycin. 

Finally, the T7 promoter was replaced with the lpp5 promoter*’. To achieve that, 
inverse PCR was performed using pT70GFP as template and phosphorylated 
primers 5'-TATACTTGTGGAATTGTGAGCGGATAACAATTCTATATCTG 
TTATTTITTTCA-3’ and 5'’-ACACAAAGTTTTITTATGTTGTCAATATTTTT 
TIGATAGTGAGTCGTATTAGGATC-3’, (the lpp promoter is underlined). 
The lacO site (bold) was included to provide for inducible expression in POP2136 
strain controlled with IPTG. DNA was purified, blunt-end ligated, transformed into 
DH5z. cells and plated on LB-agar supplemented with 30 tg ml”! kanamycin. The 
resulting plasmid pLpp50GFP (Extended Data Fig. 1c) contains a lpp5 promoter, 
lacO site, orthogonal Shine-Dalgarno sequence, sf-gfp gene, T1/T2 terminator, pMB1 
origin of replication, a Jacl gene and a kanamycin-resistance gene. 

The anti-Shine-Dalgarno sequence of pRibo-T 16S rRNA was mutated from 
wild-type (5’-TCACCTCCTTA-3’) to an orthogonal sequence (5'-TCATTG 
TGGTA-3’)’ by inverse PCR using phosphorylated primers 5’-CCTTAAAGAAG 
CGTACTTTGTAG-3’ and 5'-TACCACAATGATCCAACCGCAGG-3’, pRibo- 
T as template and Phusion High Fidelity DNA polymerase. PCR was run at the 
following conditions: 98 °C, 3 min followed by 25 cycles (98 °C, 30 s; 55 °C, 30 s; 
72°C, 120s), followed by final extension 72°C, 10 min. Correct size band was 
purified by agarose gel electrophoresis and extracted using the E.Z.N.A. Gel 
Extraction kit. It was circularized by blunt-end ligation and transformed into 
POP2136 electrocompetent cells. Cells were plated on LB/agar plates supplemen- 
ted with 50 ug ml * carbenicillin and grown at 30°C overnight. Colonies were 
isolated and poRibo-T was fully sequenced. 

Testing activity of oRibo-T in vivo. Electrocompetent POP2136 cells were trans- 
formed with the following plasmid combinations: (1) pAM552 and pT7wtK (no 
gfp control), (2) pAM552 and pLpp50GFP, (3) pAM552o0 and pLpp50GEP, and 
(4) poRibo-T1 and pLpp50GFEP. Transformants were plated on LB plates supple- 
mented with 50 pg ml carbenicillin and 30 tg ml’ kanamycin and incubated 
for 24h at 30°C. Wells of a 96-well plate with low evaporation lid (Costar) was 
filled with 100 ul of LB media supplemented with 501g ml’ carbenicillin and 
30 pg ml~’ kanamycin. The wells were inoculated with colonies from each plas- 
mid combination above (six colonies each), and incubated at 30 °C for 14h with 
shaking. Clear bottom chimney wells of another 96-well plate (Costar) were filled 
with 100 pl of LB media supplemented with 50 pg ml’ carbenicillin, 30 pg ml! 
kanamycin, and 1 mM IPTG. The plate was inoculated with 2 ul of saturated initial 
inoculation plate, and incubated with linear shaking (731 cycles per min) for 16h 
at 42 °C on a Biotek Synergy H1 plate reader, with continuous monitoring of cell 
density (Agoo nm) and sfGFP fluorescence (excitation 485 and emission 528 with 
sensitivity setting at 80). 

Testing oRibo-T activity in cell-free translation system. Ribosomes (wild-type) 
or oRibo-T (mixed with wild-type ribosomes) were prepared from $Q171fg cells 
transformed with pAM552 or poRibo-T1, respectively. An orthogonal sf-gfp gene 
was PCR amplified from the plasmid pT7oGFP using primers 5’-TAATA 
CGACTCACTATAGGG-3’ and 5'-ACTCGTCGAGATCGATCT-3’. The tran- 
scription-translation reaction was carried out in A(ribosome, amino acid, tRNA) 
PURExpress system as described above. The 7.5-1l reactions were supplemented 
with 18.75 ng DNA template and 7.5 pmol ribosomes, and when needed, clinda- 
mycin or pactamycin were added to the reactions to the final concentrations of 
50 uM or 100 1M, respectively. 

For in vitro translation of an orthogonal secM-lacZa template, it was PCR- 
amplified from the poSML plasmid using a direct primer 5’-TAATACGACT 
CACTATAGGG-3’ corresponding to the T7 promoter and a reverse primer 5’- 
TTCCCAGTCACGACGTT-3’, which allowed preserving 18 codons after the 
SecM arrest site. mRNA was prepared by in vitro transcription and purified. It 
was then translated in the A(ribosome, amino acid, tRNA) PURExpress system 
assembled in a total volume of 5 pil and containing 1 jl of the kit solution A, 0.6 ul 
of factor mixture, 0.5 pl amino acid mixture (3mM each) lacking methionine, 
0.2 ul of [°°S].-methionine 8.5 1M (1,175 Cimmol'), 0.5 pl tRNA (20 pg ml’), 
0.2 pl Ribolock RNase inhibitor (40 U pl~'), 100 1M pactamycin, 10 pmol tran- 
script and 10 pmol of total ribosomes. Translation was carried out for 5 min at 
37 °C, followed by addition of 1 j1g of RNase A and incubation for 5 min at 37 °C. 
Translation products were analysed in 16.5% Tricine SDS-PAGE”. The gel was 
stained, dried, and exposed to a phosphorimager screen overnight. 
Construction of C41(DE3)/AlacZ58(M15). The AlacZ58(M15) allele required 
for alpha complementation was transduced from the E. coli strain K1342 (E. coli 
Genetic Stock Center, Yale) into E. coliC41(DE3) strain by P1 phage transduction 
protocol°®. Transductants were selected on LB agar supplemented with 10 1g ml 
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tetracycline. Then colonies were re-streaked on LB-agar plates containing 
104g ml tetracycline, 200 1M IPTG and 801g ml~* X-Gal. The replacement 
of wild-type lacZ with the 4lacZ58(M15) allele was verified by PCR using primers 
5'-ACCATGATTACGGATTCACTGG-3’ and 5'-CCGTTGCACCACAGATG 
AA-3' (the sizes of the expected PCR products are 467 bp for wild-type and 
374 bp for the mutant). 

Construction of the orthogonal SecM-lacZa reporter poSML. The backbone of 
the pACYC177 vector was PCR-amplified using primers 5’-ATCTCATGACCAA 
AATCCCTTAACGTGAGT-3’ and 5'-GCGGTTAGCTTTTACCCCTGCATCT 
TTGAG-3’. A 568-bp DNA fragment in which the ends overlapped with the 
amplified pACYC177 backbone and which contained T7 promoter, the ortho- 
gonal Shine-Dalgarno sequence CACCAC’, the secM(121-166)-lacZa fusion from 
the plasmid pNH122 (ref. 18), was synthesized by Integrated DNA Technologies. 
The pACYC177 backbone and the secM-lacZa construct were combined using 
Gibson Assembly and introduced in the C41(DE3)/AlacZ58(M15) cells. 
Construction of the 2451/2452 mutant poRibo-T library and selecting 
mutants capable of alleviating SecM-mediated translation arrest. A library of 
A2451N/C2452N mutants was generated by inverse PCR using plasmid poRibo- 
T2 asa template, Phusion High Fidelity DNA polymerase, and primers 5'-AGGC 
TGATACCGCCCAAG-3’ and 5'-CTCTTGGGCGGTATCAGCCTNNTATCC 
CCGGAGTACCTTTTATC-3’, with added sequence (underlined) used for 
re-circularization with Gibson assembly. PCR reaction was carried out under 
the following conditions: 98 °C, 3 min followed by 25 cycles (98°C, 30s; 55 °C, 
30s; 72 °C, 120s), followed by final extension 72 °C, 10 min. The PCR-amplified 
DNA band was purified by extraction from the agarose gel with an E.Z.N.A. gel 
extraction kit, and re-circularized by Gibson assembly for 1 h at 50 °C. Two micro- 
litres of the reaction were transformed into electrocompotent POP2136 cells pla- 
ted on LB plates supplemented with 50 jig ml“! carbenicillin and grown for 24h at 
30 °C. Individual colonies were picked and sequenced to identify all possible 16 
variants of the library. 

The C41(DE3)/AlacZ58(M15) cells were transformed with the poSML reporter 
plasmid (Extended Data Fig. 1d) and plated on LB-agar containing 50 yg ml‘ 
kanamycin. One of the colonies, which appeared after overnight incubation at 
37°C, was inoculated into liquid culture, grown in the presence of 50 pg ml! 
kanamycin and cells were rendered chemically competent. Cells were transformed 
with the pooled library of 16 2451/2452 mutants. Transformed cells were plated on 
LB agar containing 50 1g ml * kanamycin, 100 1g ml * ampicillin, 0.5 mM IPTG, 
40 jig ml’ X-Gal and 2 mM lacZ inhibitor phenylethyl-B-d-thiogalactopyranoside 


(PETG). Plates were incubated at 37 °C for 24h and photographed. Sixteen white 
colonies or fifteen blue colonies were inoculated in 5 ml of LB medium supplemen- 
ted with 100 1g ml! ampicillin and grown overnight. The plasmids were isolated 
and the identities of nucleotide residues at the position 2451 and 2452 of the 23S 
rRNA were analysed by sequencing. Alternatively, the poSML-transformed 
C41(DE3)/AlacZ58(M15) cells were transformed with individual plasmids repre- 
senting all possible 16 variants of the nucleotide combinations at positions 2451 and 
2452. The poRibo-T2 plasmid carrying A2058G mutation was used as a control. In 
addition, the poRibo-T2 plasmid carrying the U2585G mutation was included in 
the transformation experiment. The transformed cells were plated on LB/agar 
containing 50pgml' kanamycin and 100pgml~' ampicillin and incubated 
overnight at 37°C. Three colonies from each transformation were then streaked 
on LB/agar plates containing 50 tg ml’ kanamycin and 100 pg ml ampicillin 
and supplemented with 0.5 mM IPTG, 40 jig ml~' X-Gal and 2 mM PETG. Plates 
were incubated at 37 °C for 22h and photographed. 
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Extended Data Figure 1 | Key plasmids used in the study. a, The pAM552 
plasmid is a derivative of pLK35 (ref. 27), from which the unessential segments 
of the pBR322 cloning vector have been removed. pAM552 contains the entire 
rrnB operon of E. coli under the control of the phage lambda P;, promoter, 
which is constitutively active in the conventional E. coli strains but is silent at 
30 °C in the strain POP2136 (30 °C) carrying the cI857 gene of the temperature- 
sensitive lambda repressor. The 16S rRNA gene is shown in orange, and the 
16S rRNA processing stem sequences indicated in yellow. The 23S rRNA 
gene is blue, and the corresponding processing stem sequences are light blue. 
The intergenic tRNA“™ gene is shown in dark grey. b, The map of the pRibo- 
T8/9 plasmid derived from pAM552. The native 5’ and 3’ ends of the 23S 


o-SD 
ATG... 


.- TGA 


T7 term. 


poSML 
2853 bp 


rRNA were linked via a tetranucleotide sequence GAGA (connector C shown 
in green), and circularly permutated 23 rRNA gene, ‘opened’ in the apex loop of 
H101, was inserted in the apex loop of 16S rRNA helix h44 via an Ag linker 
Tl and an Ag linker T2 (red bars). c, The map of the backbone plasmid pT7wtK 
and the reporter plasmids pT70GFP and pLpp50GEP, expressing sf-gfp 
controlled by an orthogonal Shine-Dalgarno sequence (orange semi-circle) 
under T7 or lpp5 promoters (black triangles). d, The map of the pACYC177- 
derived plasmid containing the secM-lacZa reporter gene controlled by the 
T7 promoter (black triangle) and alternative Shine-Dalgarno sequence (orange 
semi-circle). The sequence of the secM-lacZa reporter matches that in the 
originally described plasmid pNH122 (ref. 18). 
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Extended Data Figure 2 | The experimental scheme of preparing and testing 
circularly permuted 23S rRNA gene library. a, The CP23S template is 
generated from pCP23S-Eagl plasmid by Eagl digestion and ligation. Each 
CP23S variant is generated by PCR using circularized 23S rRNA gene as a 
template and a unique primer pair, with added sequences overlapping the 
destination plasmid backbone. b, The plasmid backbone is prepared by 
digestion of pAM552-A23S-AflII with the AflII restriction enzyme, which 
linearizes the backbone at the 23S processing stem site. c, Gibson assembly is 


CP23S 


E 
= -—wt 23S 


3 
hated CP 23S 


—_ 
controls 


used to incorporate each CP23S variant into the plasmid backbone to 
generate the 91 target circular permutants. d, The pAM-CP23S plasmids are 
transformed into the SQ171 strain lacking chromosomal rRNA operons 

and carrying the pCSacB plasmid with the wild-type rRNA operon, and 
transformants resistant to ampicillin, erythromycin and sucrose are selected. 
e, A complete replacement of pCSacB with pAM-CP23S is verified by a three- 
primer diagnostic PCR. 
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Extended Data Figure 3 | The Ribo-T tethers allow for the ribosome are shown. 16S and 23S rRNAs in the non-rotated state are tan and pale blue, 
ratcheting. Distance changes (A) between the 16S rRNA and 23S rRNA and in the rotated state are gold and blue, respectively. The structures of the 
residues h44 and H101 connected by the oligo(A) linkers in Ribo-T when the _ E. coli ribosomes used for measuring the distances and generating the figure 
ribosome undergoes the transition from the classic to the rotated state. The have PDB accession numbers 3R8T and 4GD2 (non-rotated state) and 


distances between the 5’ phosphorus atoms of the corresponding nucleotides | 3R8S and 4GD1 (rotated state). 
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Extended Data Figure 4 | Chromosomal mutations enhance growth of 
$Q171 cells in which Ribo-T completely replace wild-type ribosomes. 

a, Growth curves of the parental $Q171 cells transformed with the 
pAM552(G2058) plasmid (black curve) or pRibo-T8/9 plasmid (blue curve) or 
selected fast growing mutant (SQ171fg) transformed with pRibo-T8/9 (green 
curve). The cells express homogeneous populations of ribosomes (wt for 
pAM552 transformants or Ribo-T for the pRibo-T8/9 transformants, see 
panels b and c). b, PCR analysis of rDNA in the SQ171fg strain transformed 
with pRibo-T8/9 (the SQ110 strain that carries a single chromosomal copy 
of the rrn allele served as a wild-type control). The PCR primers amplify the 
302-base-pair 23S rRNA gene segment ‘across’ the H101 hairpin in wild-type 
rDNA. In pRibo-T, the primer annealing sites are more than 4.8 kb apart (black 
dashed line), which prevents formation of the PCR product. Two additional 
primers designed to amplify a 467-bp fragment from the lacZ gene were 
included in the same PCR reaction as an internal control. The gel is 
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representative of two independent biological experiments. c, Primer extension 
analysis of rRNA expressed in the SQ171fg cells transformed with pAM552 
(WT), pAM552 with the A2058G mutation, or pRibo-T8/9, which carries 

the A2058G mutation. Primer extension was carried out in the presence of 
dTTP and ddCTP. Because Ribo-T contains the A2058G mutation in the 23S 
rRNA sequence, the generated cDNA is one nucleotide shorter than the one 
generated on the wild-type 23S rRNA template. The lack of the 20-nucleotide 
cDNA band in the Ribo-T sample demonstrates the absence of wild-type 

23S rRNA in the SQ171fg cells transformed with pRibo-T8/9. The gel is 
representative of three independent biological experiments. d, e, Chromosomal 
mutations in SQ171fg: a nonsense mutation in the Leu codon 22 of the ybeX 
gene encoding a protein similar to Mg”*/Co”~ efflux transporter (d); and a 
missense mutation in codon 549 of the rpsA gene encoding ribosomal 
protein S1 (e). 
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Extended Data Figure 5 | Ribo-T composition and integrity of the linkers. 
a, b, Analysis of rRNA extracted from the isolated wild-type ribosomes or 
Ribo-T in a denaturing 4% (a) or 8% (b) polyacrylamide gel. a, Ribo-T(1) and 
Ribo-T(2) represent two individual preparations with Ribo-T(2) isolated 
following the standard procedure (see Methods), and Ribo-T(1) isolated by 
immediate pelleting through the sucrose cushion after the cell lysis. The 

faint bands in the Ribo-T2 preparation indicated by the asterisks could be 
occasionally seen in some preparations; they probably represent rRNA 
fragments generated by cleavage of the linkers in a small fraction of Ribo-T 
either in the cell or during Ribo-T preparation. b, 5S rRNA is present in Ribo-T. 
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c, The relative abundance of small and large subunit proteins in Ribo-T in 
comparison with wild-type ribosome as determined by mass spectrometry 
(protein L26 could not be reliably quantified in Ribo-T and wild-type 
ribosomes). The data represent the average of three technical replicates, and 
error bars indicate the s.d. d, Analysis of the integrity of the T1 and T2 linkers 
in a Ribo-T preparation by primer extension. The 22-nucleotide-long primer 
was extended across the T1 linker in the presence of ddCTP terminator and 
the 23S-nucleotide-long primer was extended across the T2 linker in the 
presence of ddGTP terminator. Control samples (—) represent the unextended 
primers. The gels are representative of two independent experiments. 
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Extended Data Figure 6 | Ribo-T can successfully translate most cellular the average values of experiments performed in two biological replicates 
polypeptides. a, Protein synthesis rate in SQ171fg cells expressing wild-type each done in two technical duplicates. Error bars denote s.d. b, c, 2D gel 
ribosomes or Ribo-T. Protein synthesis was measured by quantifying the electrophoresis analysis of the proteins expressed in exponentially growing 
incorporation of [°°S] L-methionine into TCA-insoluble protein fraction SQ171fg transformed with pAM552 (A2058G) (b) or pRibo-T (c). 


during a 45-s incubation at 37 °C in minimal medium. The bar graphs represent 
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diagrams on the right represent the secondary structures of helices H101 and 
h44 in wild-type ribosomes (left) and Ribo-T (right), with the nucleotide 
residues modified strongly, moderately and weakly indicated by black, grey and 
white circles, respectively. The gels are representative of two independent 
experiments. 


Extended Data Figure 7 | Chemical probing of the structure of the Ribo-T 
linkers. Ribo-T or wild-type ribosomes were modified by dimethylsulfate, 
and extracted rRNA was subjected to primer extension analysis. In each gel, 
the left two lanes (‘C’ and ‘A’) represent sequencing reactions followed by 
dimethylsulfate-modified sample and control (unmodified) RNA. The 
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Extended Data Figure 8 | Translation of the orthogonal sf-gfp gene by 
oRibo-T in vivo and in vitro. a, Expression of an orthogonal sf-gfp reporter in 
the E. coli POP2136 cells transformed with pAM552 plasmid encoding wild- 
type rRNA (wt Rbs), pAM552 with an orthogonal Shine-Dalgarno sequence in 
16S rRNA of a non-tethered ribosome (oRbs) or poRibo-T1 expressing an 
orthogonal Ribo-T (green bar). Cells lacking gfp reporter gene (wt Rbs Agfp) 
were used as a background fluorescence control. The data represent the average 
value of six biological replicates in technical triplicates; error bars indicate 
the s.d. b, In vitro translation of the orthogonal sf-gfp reporter by non-tethered 
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non-orthogonal wt ribosomes (pink lines), or oORibo-T(A2058G) (which 

also contained cellular wild-type ribosomes) (green lines). The dotted lines 
correspond to the translation reactions without antibiotic and solid lines 
represent reactions supplemented with 50 1M clindamycin (Cld). ¢, Same as 
in b, but oRibo-T contained a G693A mutation instead of A2058G and 
clindamycin was replaced with 100 |1M pactamycin (Pct). The red stars indicate 
the ribosomal subunit carrying the antibiotic-resistance mutation. Graphs in 
b and c are each representative of two biological replicates each performed 

in technical triplicates, and error bars indicating the s.d. 
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Extended Data Figure 9 | Promoter mutation in oRibo-T improves poRibo-T2 revealed a single mutation in the P;, promoter controlling Ribo-T 
transformation of the E. coli cells. a, b, Several E. coli strains, including BL21 _ expression, altering the ‘-10’ box from GATACT to TATACT bringing it 
shown in this figure, as well as JM109 and C41, produced slowly growing, closer to the TATAAT consensus. It is unclear why the promoter mutation 
heterogeneous colonies when transformed with poRibo-T1. ¢, Fortuitously, in improves performance of poRibo-T (as well as of non-orthogonal pRibo-T) in 
the course of the experiments we isolated a spontaneous mutant plasmid, ‘unselected’ E. coli cells. The plates show representative results of three 
poRibo-T2, which showed improved transformation efficiency, producing independent biological experiments. 


evenly sized colonies after a single overnight incubation. Sequencing of 
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Extended Data Table 1 | Characterization of the growth of E. co/i SQ171 cells expressing a pure population of ribosomes with circularly 
permuted 23S rRNA 


Doubling time (min) * Cell density (ODeoo ) at saturationt 
30°C 37°C 30°C 37 °C nfl 
pAM552 $ 61.04 3.2 53.9 + 1.0 1.04 + 0.06 0.93 + 0.03 4 
pAM552-Afill § 67.4+1.0 53.3 + 2.4 1.07 + 0.01 0.97 + 0.00 4 
cP67 |! 106.4+5.4 69.6 + 2.1 0.83 + 0.05 0.41 + 0.07 3 
CP95 144.94+35.9 8244244 0.66 + 0.31 0.51 + 0.18 6 
CP104 90.8 + 10.3 52.7 + 3.2 0.98 + 0.03 0.95 + 0.02 3 
CP168 123.84+27.9 57.741.9 0.70 + 0.22 0.88 + 0.12 10 
CP281 100.1411.0 546+10.1 1.01 + 0.04 0.93 + 0.13 3 
CP549 101.7418.2 46543.9 1.00 + 0.02 0.98 + 0.03 3 
CP617 231.7420.5 91.54 18.5 0.16 + 0.03 0.85 + 0.05 4 
CP634 16204342 21254581 0464019 0.50 + 0.10 3 
CP879 106.6 +4.7 51.4446 1.03 + 0.02 0.99 + 0.04 3 
CP891 14454418 60.7+4.1 0.56 + 0.43 0.76 + 0.23 6 
CP1112 89.6 + 6.0 57.8 + 12.2 0.96 + 0.02 0.91 + 0.12 3 
CP1178 102541100 46.241.3 0.96 + 0.02 0.99 + 0.01 3 
CP1498 16754175 118.0417.1 0.56+40.32 0.52 + 0.19 3 
CP1511 131.5+44.2 76.741.5 0.88 + 0.01 0.88 + 0.01 3 
CP1587 98.14 12.4 55.1+6.6 0.93 + 0.05 0.92 + 0.08 3 
CP1716 17444319 11784165 0444016 0.62 + 0.34 3 
CP1733 117.3 48.2 83.84 2.2 0.95 + 0.01 0.80 + 0.01 3 
CP1741 230.04+14.7 269.0+503 0.28+0.00 0.66 + 0.09 3 
CP1873 108.4+6.5 52.9 + 0.8 0.94 + 0.01 0.91 + 0.01 3 
CP2148 83.04 2.9 52.4 + 3.9 0.73 + 0.09 0.82 + 0.02 4 
CP2800 85.9 + 15.7 53.5 + 9.7 1.04 + 0.03 0.91 + 0.12 3 
CP2861 13844107 93.7445 0.88 + 0.00 0.83 + 0.04 3 


*Growth in 100 ul LB media supplemented with 50 wg ml? carbenicillin in 96-well plate with shaking. 

+After 18 h of growth. 

tpAM552: wild-type rrnB operon. 

§pAM552-Aflll: rrnB operon with the 23S rRNA mutations G2C and C2901G used to introduce the Aflll restriction sites. 

||CPx: rrnB with 23S circular permutations and G2C/C2901G mutations; x indicates the 5’ starting nucleotide of the circularly permuted 23S gene. 
Biological replicates are indicated in the ‘n’ column, which is the number of separate colonies that were used for each mean number and s.d. 
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Inventory -tracking systems range from paper filing to custom-made databases. 
Using the right system can save researchers time, money and frustration. 


BY JEFFREY M. PERKEL 


hen Marilyn Goudreault received 
a request for plasmids stored in 
the repository of the laboratory 


she manages at the Lunenfeld~Tanenbaum 
Research Institute in Toronto, Canada, there 
was never any question whether she would 
honour it. Reagent sharing is typically a pre- 
condition of publication in peer-reviewed 
journals, and is fundamental to the scientific 
process. But first, Goudreault would have to 
find the plasmids — circular strings of DNA. 


In many labs, the task might have required 
a tortuous search through old notebooks, 
out-of-date spreadsheets and frost-encrusted 
freezer boxes. But in Goudreault’s lab, rea- 
gents are tracked with OpenFreezer: a free, 
web-based system designed to document data 
such as the location, source, creator and bio- 
logical properties of every reagent in a user’s 
possession — including not just plasmids, but 
also antibodies and stretches of DNA, RNA 
and protein. Goudreault needed only to run 
a quick search for the materials, then retrieve 
the indicated boxes from storage. “I had 
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everything within 15 minutes, she says. 
OpenFreezer is one of a number of 
computerized inventory systems developed 
to simplify lab management. They range from 
simple homespun databases for individual 
labs to enterprise-level systems, and accom- 
modate a range of budgets. Some are designed 
for documenting frozen samples; others for 
tracking chemicals or lab animals. Some 
facilitate purchasing and equipment schedul- 
ing; others are limited to simple descriptions. 
But in all cases, the goal is to ensure that lab 
workers know what resources are available >» 
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» to them, and where to find them. 

Many labs track their inventories with 
nothing more than sheets of paper ina binder 
or entries in an Excel spreadsheet. But some 
are using more-sophisticated database soft- 
ware. In the late 1990s, for example, virolo- 
gist Joe Mymryk created a Microsoft Access 
database to track key reagents when he set 
up his lab at Western University in London, 
Canada. In 2007, his graduate student Ahmed 
Yousef joined Ibrahim Baggili, a computer- 
science graduate student then at Purdue Uni- 
versity in West Lafayette, Indiana, to develop 
a friendlier, Windows-based interface to the 
system, called LINA (Laboratory Inventory 
Network Application; A. F. Yousef et al. J. Lab. 
Automat. 16, 82-89; 2011). 

LINA draws from a series of Access 
databases — one for each class of reagent, 
including bacterial and yeast strains and short 
sequences of DNA and RNA known as oligo- 
nucleotides. As new reagents are developed or 
acquired, they are logged in the system, which 
assigns each one a unique identifier. Samples 
are then organized in freezer boxes accord- 
ing to those numbers and users can search the 
database by keyword, source and function. 


SEARCH AND RESCUE 

For Mymryk, LINA’s most useful feature is a 
tool to search and compare DNA sequences. 
This means that he can enter a gene sequence 
and check whether the library contains any 
oligonucleotides that could be used to amplify 
it, rather than just ordering new ones. “The 
oligo thing has really saved my bacon,” he says. 

LINA is free and simple to use, which 
makes it particularly attractive for small 
molecular-biology labs. But more-advanced 
options are also available at no cost. Marie 
Ebersole, who manages the chemistry prepa- 
ration room at Wellesley College in Massa- 
chusetts, opted to upgrade her Excel-based 
system to Quartzy, a free cloud-based system 
that allows her to track purchases for her 
1,000-reagent collection. Quartzy’s zero cost 
figured prominently in her decision. “I didn’t 
have to have ‘buy-im from 12 different peo- 
ple in 3 departments, and I could upload my 
existing spreadsheets,” she says. 

Ebersole uses Quartzy mainly for tracking 
dry and liquid chemicals. But it can also track 
freezer boxes, so that users know precisely what 
each slot of a given container holds. When 
stocks run low, users click on a button to reor- 
der, and the system automatically alerts the 
manager so that she or he can track the order's 
status. (The system is able to offer its service 
for free because it incorporates catalogues 
from several reagent vendors, and suggests 
those products when orders are placed.) Other 
features include support for tracking barcodes 
attached to individual samples, as well as equip- 
ment scheduling and document management 
for maintaining lab manuals and the like. 

For Ebersole, Quartzy’s features not only 


improve the efficiency of the lab, they cut 
down on her costs. “I’ve saved about a third 
of my budget,” she says. In part, that is because 
there is less waste: by knowing precisely what 
chemicals she has to hand, Ebersole can use 
up old reagents before buying fresh ones. And 
when she does buy new chemicals, she says, 
she can do so in smaller quantities than before. 

Another option is StrainControl, which 
has been developed by DNA Globe of Umea, 
Sweden. The software is free for individual 
researchers in small labs; a professional licence 
for 10 users costs US$79.95; and a 50-user 
licence costs $649.95. Both of the paid versions 
allow the software to be used on a computer 
network or cloud-based service. 


Although its name 

“We believe evokes images of fruit 
that providing ae nme mice, Strain- 
= ontrol can accom- 
hig a modate the resources 
of most wet-lab biol- 

reagents and ogists, says Kristof- 
methods coupled fer Lindell, DNA 
with linkage to Globe’s external- 
experimental relations manager. 
data will help The software, which 
to improve the has some 15,000 
reproducibility _ users, provides sup- 
problem.” port for different 


lab-organism strains, 
proteins, plasmids, antibodies and chemicals, 
and like some other tools, is compatible with 
sample barcoding. Users can rename any of 
the fields to suit their needs, Lindell says; as a 
result, StrainControl can be used to catalogue 
anything, whether lab-related or not. An 
imminent update will allow users to add one 
or two custom modules to the database (not 
just reconfigure existing ones), for tasks such 
as tracking references. 

Other systems are more specialized. A lab 
information-management system (LIMS) called 
mLIMS, developed by BioInfoRx in Madison, 
Wisconsin, is designed to track rodent colonies, 
for instance. Some also offer connectivity to 
electronic notebooks. Labguru, for instance, is 
a cloud-based application that tracks plasmids, 
bacteria, antibodies, plants, rodents and pro- 
teins and has a built-in electronic notebook, says 
product specialist Xavier Armand. Developed 
with investment from Nature’s parent company, 
Holtzbrinck Publishing Group in Stuttgart, 
Germany, Labguru costs $120 per user per year 
for academics and $450 per user per year for 
industry labs and is produced by BioData in 
Cambridge, Massachusetts. 

Usually, Armand explains, inventories and 
electronic lab notebooks connect details about 
each sample to experimental results, so users 
can track which reagents were used in which 
experiments. “We believe that providing high- 
fidelity metadata for reagents and methods 
coupled with linkage to experimental data will 
help to improve the reproducibility problem,’ he 
explains. It should make it easier for researchers 
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to duplicate the findings of their own and other 
labs’ experiments. Similarly, Freezer Web Access 
and Lab Inventory, both from ATGC Labs in 
Potomac, Maryland, allow users to link their 
reagents to a LIMS. Software developer Pavel 
Bolotov says that both applications cost $150 
per user and $350 per server installation — plus 
$1,000-—20,000 for customization. 


LINKED-UP APPROACH 

Some research institutes and companies 
centralize inventory management at a large 
scale. The office of Environmental Health & 
Radiation Safety (EHRS) at the University of 
Pennsylvania in Philadelphia has spent the 
past several years moving its 700 lab groups to 
the unified system CISPro, which is developed 
by BIOVIA (formerly Accelrys) in San Diego, 
California. According to EHRS lab-safety 
specialist Kimberly Bush, institution-wide 
tracking facilitates three key tasks: compliance 
reporting (for example, whether the university 
is meeting building-code limits on flammable 
materials), cross-lab material sharing and uni- 
versity-wide reagent monitoring. “Those are 
difficult or impossible to accomplish if there 
are 700 standalone inventory systems,” she says. 

In 2011, the EHRS received $50,000 from 
the Penn Green Fund, a university sustain- 
ability initiative, to implement CISPro as part 
of an effort to reduce waste and consolidate 
inventory management. Today, only about one 
in eight labs is on-board. The roll-out has been 
anything but smooth, Bush says, and exem- 
plifies the challenges of inventory tracking. 
Because the university’s chemical purchases do 
not go through a central office, each lab has to 
be trained to create and upload its own inven- 
tories. And the process of creating the database 
is cumbersome and error-prone. For instance, 
a chemical might have multiple names, and 
inconsistencies in database set-up and material 
logging can make the chemical difficult to recall 
at a later date, leading to unnecessary reorder- 
ing. Thus, she notes, some users actually main- 
tain two systems, “but that’s duplicate effort”. 

Furthermore, CISPro is designed to give 
every chemical container a unique barcode. 
But for users that consume bottle after bottle 
of a given solvent, the repetitive logging can 
become tedious. In that case, says Bush, users 
might reserve and reuse a handful of barcodes 
on the door of the flammables cabinet. “To keep 
an inventory as accurate as possible you have 
to consider both the chemicals and the users’ 
workflow,’ she advises. 

Whichever tracking system researchers 
choose, they can be confident at least of this: 
they need never be at a loss for their lab’s 
resources again. If nothing else, says Mymryk, 
that could save researchers some awkward 
moments: “There's nothing more embarrassing 
than having to ask for the same reagent twice.” m 


Jeffrey M. Perkel is a freelance writer in 
Pocatello, Idaho 
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STRIPPED TO ZERO 


BY STEPHEN S. POWER 


the stoop. After an hour, I grab Tommy's 

Caillou backpack and reach for his hand. 
He tucks it against his chest. It kills me, but 
I can't blame him. Id call his mother if she 
would carry a phone. Or answer if she did. 

Tommy follows me inside and asks: 
“Do I still get chips for being good?” 

“Sure; I say, turning, “if you can beat 
me. Go!” 

We race across the lobby and down a hall 
to the 24Shop, a small room lined with video 
displays. I let him dart in just ahead of me, 
and the shop says: “Good morning, Tommy.” 

“How does she always know my name, 
Daddy?” 

I shrug. To a four-year-old, even the most 
mundane technology is indistinguishable 
from magic. 

The shop has a woman's voice, soft and 
warm. I imagine her kneeling when she asks 
him: “What would you like, Tommy?” 

He looks from screen to screen. Dancing 
chips. Splashing sodas. Cookies, ice cream 
and comfort foods. The shop says: “How 
about corn flakes with milk?” A bowl of 
cereal appears. 

“No, chips,” he says. 

“Tt's much too early. Oatmeal with cinna- 
mon?” Steaming oatmeal appears. 

“No, chips! Daddy...” 

Stupid nutrition protocols. “He can have 
a snack” 

The shop says nothing. Instead, images 
flow down a screen like a slot machine 
before settling on a MoonPie. 

“Yes!” 

“And a coke?” the shop asks. 

“Why not?” I say. 

A red light blinks above the bill slot. 
Standing behind Tommy, I nod, and the light 
turns green. A MoonPie tumbles into one 
tray, a can of RC into another. 

“What do you want, Henry?” 

Tommy takes my hand. “Nothing,” I say. 
“Tm good.” 

Upstairs, Tommy turns on the TV and 
tears into his food. He’s promptly shown 
commercials for MoonPies and RC, a fact 
to which he pays no attention. 

I head for my reading room and find 

Karen sitting on the 


[= know why we bother waiting on 


> NATURE.COM toilet tank. The mir- 
Follow Futures: ror’s unplugged and 
Y @NatureFutures draped with towels. 

E3 go.nature.com/mtoodm I close the door. 
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Someone to watch over you. 


“What are you doing in here? How did you 
even get in?” 

“I spoofed a pass card.” 

“Td get you a real card” 

“Worse than phones.’ She glances through 
the high, small window. 

“He waited an hour for you.” 

“T know. I watched” 

“From the shadows? Jesus. He can’t 
remember most of your shit, but it’s starting 
to stick? 

“Tt’s not shit?” 

Thold up my hands. “Look. He misses you. 
Come on out. I’ll tell him you —” 

“Dont make excuses for me. And I’m not 
going near that T'V. This toilet’s bad enough. 
Probably reporting my weight.” She lifts her 
boots off the lid. 

“Fine. Pll call him? 

“No? 

“Then why get his hopes up? Why... 
this?” 

“I wanted to see him, but I needed to 
speak with you.” 

She slides down and stands close. She 
seems taller. And thinner. Probably the 
boots. 

‘Tm leaving,’ she says. “For good. I wont 
be coded anymore. I won't be tagged. It’s 
killing me” 

“So you'll kill him instead.” 

“He's another tag, Henry.” 

“He’ a little boy.” 

“No. We're just data sets here. Why can’t 
you see that? Is that all you want him to be?” 

Now get it. “You're not taking him” 

“We could live clean. Stripped to zero. 
Anonymous. This place I'm going —” 
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“Tll get him to his room,’ I say and grip 
the door knob. “Slither out, and the TV 
wont see you either” 

I don't worry about her snatching Tommy. 
Itd be easier for her disappear if no one 
wanted to find her, and I would. 

“Then tell him,” she says, “when he’s old 
enough, tell him that ’'m not crazy.” 

“He'll never be that old” 

My watch screen flares. Tommy knocks. 

“Daddy, I don't feel well” 

I look at Karen. She's already ducking 
behind the black shower curtain. 

I open the door. Tommy’s face is pale, 
sweaty and smeared with MoonPie. With a 
whir, the toilet lifts its lid. 

“Quickly” We kneel together on the mat, 
and Tommy spews brown-black vomit. 

Ican hear my mother say: “You just had to 
let him eat all that junk, didn’t you?” 

The toilet expresses a milky foam that 
bonds with the vomit, then it vacuums both 
away. I wipe Tommy’s mouth with a tissue as 
the scent of vanilla fills the room. 

“Smells like Mommy,’ he says. 

“Yeah.” I used to love her vanilla perfume. 
“T could set the vents to vanilla too” 

“No, I want Mommy,’ 

“T know? I rub his back. 

“Why didi't she come?” Tommy slams the 
toilet lid down. “Where is she?” 

I take his wrists and turn him so I can look 
him in the eyes. “Do you love her?” 

He nods. 

“Then she’s always nearby.’ 

“Like in the shower?” 

“Ha! Exactly. Come on. Let’s get a new 
shirt on you.” 

I pick Tommy up and take him through to 
his room. While he paws through a drawer, I 
hear her footstep outside. I smell the vanilla 
again, my stomach twists, and, despite 
everything, I want her to rush in and grab 
us both. So when the front door clicks, ’m 
horribly relieved, like someone watching his 
terminal partner finally die. 

Tommy pulls out his Batman T-shirt. I 
bend him into it. We go to the living room 
and flop down in a heap before the TV. The 
first commercial is for vanilla air fresheners. 

It’s on every channel. m 


Stephen S. Power’ novel The Dragon 
Round will be published by Simon451 in 
June 2016. His stories have recently appeared 
at AE and Daily Science Fiction with many 
forthcoming. He tweets at @stephenspower 
and his site is stephenspower.com. 
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